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1. 


Introduction 


In  the  past,  most  optimal  estimation  problems  have  been  studied  in 

a  vector  space  setting.  While  these  results  lend  themselves  to  simple 

solutions  in  linear  systems*’  and  in  nonlinear  systems  with  finite 

3 

dimensional  sensor  orbits  ,  no  effective  synthesis  procedures  for  optimal 
estimation  have  been  determined  for  large  classes  of  nonlinear  systems. 

It  is  the  purpose  of  this  report  to  introduce  an  alternative  to  the 
vector  space  approach  in  analyzing  the  properties  of  nonlinear  stochastic 
processes.  We  will  study  random  processes  on  a  different  type  of  space, 
namely  a  differentiable  manifold,  which  is  the  natural  domain  for  certain 
nonlinear  problems  of  practical  importance.  This  approach  will  be  shown 
to  be  useful  both  in  analyzing  the  properties  of  certain  stochastic  processes 
and  in  deriving  recursive  optimal  estimation  equations  that  are  easily 
implemented  (for  instance,  see  the  block  diagram  in  Figure  4  and  the 
associated  discussion  in  subsection  3.  3). 

More  specifically,  we  will  concern  ourselves  with  the  study  of  random 
processes  on  the  circle,  S*,  and  its  extensions  to  higher  dimensions. 

Topics  such  as  FM  demodulation,  frequency  stability,  and  single-degree- 
of-freedom  gyroscopic  analysis  are  well-known  examples  in  this  framework. 

It  is  appropriate  to  remark  that  we  will  use  several  distinct 
representations  of  the  circle  interchangeably,  depending  upon  which  is 
most  convenient.  A  point  on  the  unit  circle  can  be  represented  by  either 
the  angle  6  e  i.-7r,7r)  it  makes  with  a  fixed  reference  point  on  the  circle  or 
by  the  2x2  orthogonal  matrix 

cos  ‘J  sin  d 
-sin  Q  cos  0 


Note  that  the  addition  of  two  angles  and  9  ^  modulo  Z it  corresponds 
to  the  multiplication  of  the  two  matrices  representing  the  points. 

Another  representation  of  S*  is  as  the  set  of  complex  numbers  of 

i.6 

length  one.  Any  such  number  can  be  uniquely  written  as  e  with  6  e  [-7T, 
It),  and  the  relationship  with  the  above  representations  is  obvious. 

Finally,  there  exists  a  natural  projection  from  to  s\  identified 
with  [-it,  it): 

x  i - >  x  mod  Ztt  . 

As  Figure  1  indicates,  two  points  x^  and  x^  are  projected  onto  the  same 
point  if  and  only  if  they  differ  by  an  integral  multiple  of  Ztt  (that  is, 
e^  =  e*^  +  ^n7r^).  Thus  we  divide  the  real  numbers  into  equivalence  classes 
Jx  4  2n7r|n  e  Z  {  ,  and  to  each  element  of  S*  there  corresponds  a  unique 
equivalence  class,  with  different  points  in  S1  corresponding  to  different 
equivalence  classes.  Thus  we  can  represent  S*  by  this  set  of  equivalence 
classes,  denoted  R^/ZitZ. 

Throughout  most  of  this  report  the  first  two  representations  will  be 
used.  However,  in  Section  5  we  will  use  the  complex  number  representation 
and  in  Section  4  we  will  make  use  of  the  interpretation  given  by  the  last 
representation  above. 


Figure  1.  Illustrating  the  Projection  ;\4ap 


&&?<(***&**&*% 


Consider  the  situation  depicted  in  Figure  2.  We  have  a  unit  circle 

2 

in  R  with  a  straight  line  of  infinite  length  tangent  to  it. 


1-0  BROWNIAN 
-  MOTION 


Figure  2 


We  allow  the  line  to  perform  a  or  v. -dimensional  Bro  vnian  motion,  fix  the 
center  of  the  circle,  and  require  that  there  be  no  slipping  at  the  point  of 
tangency.  The  line  induces  a  rotation  of  the  circle,  and,  if  the  line  moves 
a  distance  x,  the  circle  rotates  x  radians,  and  is  thus  in  a  position  which 
is  x  mod  2y  =  9  radians  away  i.:om  its  initial  position. 

The  probability  density  function  for  9  satisfies  the  classical  heat 
(diffusion,  Fokker- Planck)  equation  on  the  circle: 


with  the  periodicity  condition 


Pfl(S,t)  =  P0(€  +2jr,t) 


and  initial  condition 


P0(?,O)  =  6{?  -  77)  .  (3) 

where  the  initial  orientation  of  the  circle  is  77  radians  from  some  reference 
position.  The  solution  of  (1),  (2),  and  (3)  is  widely  known,  and  is  given  by 
the  two  equivalent  ex,'  .issions 


-  I 


+o°  (g  +  Znn  -  77) 
V  2t 


1  .  1  V  -n2t/z  1 

*  U  +  f  Z  '  cos  n({  -  n) 


The  density  in  (4)  will  be  called  the  folded  normal  density.  We  give  it  this 
name  for  the  following  reason:  if  x  is  a  normal  random  variable  with 
mean  77  and  variance  y,  and  if  we  let  9  =  x  mod  Zir,  then  the  density  for 
9,  pQ,  is  given  by 


P0(S)  » 


\/2;rY 


=  F(C;7j,y)  . 


n-  -00 
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Levy  ,  and  Perrin  have  done  extensive  work  with  this  density. 

Using  this  concept  of  "wrapping”  a  random  process  around  the 
circle,  we  formulate  the  mathematical  model  of  an  observation  process 
that  can  be  described  by  a  bilinear  matrix  Ito  stochastic  equation.  Let  m 
be  a  random  process  on  R*,  and  define  z  by 

dz(t)  m(t)dt  +  dw(t) 

where  w  is  a  Brownian  motion  process  independent  of  m.  Consider  the 
associated  process 


V^V5V3'.>'<*  AVi’J^T*’  ' 


■S8®’ 


XP9* **•»  *£+•».  «•”»'«,  A»-i  . 
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9( t)  =  z(t)  mod  2ir 

Since  knowledge  of  sin  t:  and  cos  9  is  equivalent  to  knowledge  of  9,  we  wish 
to  find  an  equation  for 

(cos  9(t)  sin  0(t)l 

-sin  9(t)  cos  0(t)J 

As  will  be  shown 


dZ(t)  =  Z(t) 


-  -|  dt  m(t)dt+dw(t) 

-m(t)dt-dw(t)  -  ~  dt 


(5) 


where  the  -  dt  terms  are  the  second  order  correction  terms  given  by 
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Ito  stochastic  calculus  ’  .  These  terms  are  precisely  what  is  needed  to 
insure  (in  the  Ito  sense)  that  Z(t)  remains  an  orthogonal  matrix. 

If  we  assume  z(0)  =  0,  we  can  write 


z(t)  =  /  m(s)ds  +  w(t)  , 


and  then 


Z(t)  = 


cos  w(t) 
-sin  w(t) 


sin  w(t) 
cos  w(t) 


Hcos  [  f 

t 

-sin  [  f 


m(s)ds]  sin  [  /  m(s)dsj 

-ri 


f  m(  s)ds]  cos  [  /  m(s)ds] 
Jn  JQ 

(6) 


and,  in  this  form,  we  see  that  the  disturbance  is  multiplicative  in  nature. 
In  this  report  we  will  examine  multiplicative  noise  problems  such  as 


this  and  will  derive  estimation  equations  for  them.  In  Section  2  we  will 
examine  various  error  criteria  for  the  optimal  estimation  of  random 


fafaimir.  ^wSS^n! «r«-.-,.',:5<.y^{V*C'JJ^v^-».  —.^.>-  *  ..»- , 


variables  on  the  circle.  Section  3  deals  with  continuous  time  estimation  of 
a  class  of  stochastic  processes  on  the  circle,  and  Section  4  discusses  the 
discrete  time  problem.  Applications  of  this  theory  to  AM  and  FM 
demodulation,  optical  communication,  frequency  stability,  and  estimation 
of  the  orientation  of  a  spinning  body  are  discussed  in  Section  5.  In  addition, 
an  appendix  is  included,  in  which  the  relationship  between  the  discrete 
and  continuous  time  problems  of  Sections  3  and  4  is  discussed. 


6  i 


2.  Error  Criteria  and  Optimal  Estimates 

In  the  following  sections,  we  will  study  the  properties  of  certain 
stochastic  processes  on  the  circle  and  will  derive  equations  for  probability 
distributions  conditioned  on  observations.  The  question  of  optimal 
estimation  will  be  of  central  importance  in  Sections  3  and  4.  Thus  it 
became  necessary  to  study  how  one  uses  the  knowledge  of  the  probability 
distribution  of  the  quantity  to  be  estimated  to  choose  an  estimate  that  gives 
the  "best"  performance,  as  measured  by  some  pre-determined  figure  of 
merit. 

In  this  section,  we  will  present  a  number  of  results  on  the  optimal 

estimation  of  random  variables  taking  values  on  the  circle.  We  assume 

that  we  are  given  a  random  variable  9  taking  on  values  in  [-7r,7T),  with 

probability  density  p(@),  which  is  assumed  to  be  periodic  with  period  Zn. 

Also,  we  assume  that  we  have  an  error  function  ^ ,  also  periodic  with 

A 

period  2 r,  and  we  wish  to  choose  6  to  minimize 


-  h)  =  y  4>(e  -  0)p(0)d0 


This  is  precisely  the  S  analog  of  the  vector  space  optimal  estimation 
problem^*. 

The  motivation  throughout  this  section  is  to  provide  simple  methods 

A 

for  computing  the  minimum  of  the  cost  criterion,  <?((j >(0  -  9 )),  and  the 
A 

v%l  e  9  that  achieves  this  minimum.  In  this  light,  a  number  of  special 
cases  (i.  e.  particular  families  of  densities  and  error  functions)  are 
considered  in  detail. 

The  first  subsection  presents  a  basic  result,  analogous  to  Sherman's 
results*®’  **,  on  optimal  estimates  for  a  large  class  of  error  criteria,  but 


jw.-v 
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for  the  rather  special  case  of  unimodal  probability  density  functions. 
However,  it  is  shown  that  the  important  folded  normal  density  falls  into 
this  class. 

The  second  subsection  deals  with  the  more  general  estimation 
problem,  in  which  the  density  need  not  be  unimodal  and  the  error  function 
may  have  a  more  general  shape.  Fourier  series  is  the  basic  tool  of  this 
section.  The  third  subsection  contains  detailed  analysis  for  the  special 
cases  of  the  folded  normal  density  and  a  linear  combination  of  folded 
normal  densities. 


2. 1  Symmetric  Criteria  and  Unimodal  Distributions 

We  define  the  standard  distance  function  (Riemannian  metric)  on  the 
circle  —  i.  e.  the  distance,  p  ,  between  two  points  on  the  circle  is  the  arc 
length  of  the  shortest  path  (geodesic  line)  joining  them.  If  we  restrict  0  ^ 
and  @2  to  ta^e  values  in  the  range  [-it, 77),  we  have 

p(0j,02)  =  mindflj -02|,27t  -  |0j  -  6  2\  ) 

The  class  of  error  criteria  we  wish  to  consider  is  the  class  of 
symmetric,  nondecreasing  cost  functions  —  i.  e.  functions  <|>:S*  — »  R 
which  satisfy  _ 


0  <  <j>  (0 )  -  4>(-0) 

0  <  pl0rO)  <  p(O2,O)-^i(01)  <  i(fl2) 


(7) 


Some  examples  of  cost  criteria  satisfying  (7)  are  p  (0)  =  p(0,O),  (1  -  cos  0), 
2  2 

p  (0)  ,  (1-cos  0)  .  We  also  wish  to  consider  the  special  class  of  unimodal, 


mode-symmetric  probability  density  functions  --  i.  e.  density  functions  of 


-9- 


the  form  p:S*  — >  [0,  oo)  with  a  unique  maximum  at  i) ,  such  that 

p(rj  +0)  =  p(rj  -0)  6 

As  the  following  theorem  demonstrates,  under  these  conditions  the 
mode  of  the  density  is  the  optimal  estimate. 

Theorem  1:  Given  an  error  function  <(>  that  satisfies  (7)  and  a  unimodal, 
mode- symmetric  probability  density  function  p,  then 

#4(0 -*1»<  #4(0-*)) 

where  p  has  its  maximum  at  p. 

Proof:  The  theorem  follows  immediately  from  results  on  similarly 
ordered  functions  and  the  rearrangement  inequalities.  The  basic  result 
for  real  valued  functions  defined  on  R*  is  contained  in  Hardy,  Littlewood 
and  Polya®  (thm.  378)  and  Szego  and  Polya  ^  (p.  183).  The  result  for  S* 
is  obtained  by  making  only  minor  changes  in  these  proofs.  H 

We  remark  that  from  the  symmetry  of  the  problem,  <{>  has  its  global 
maximum  at  n  and  p  has  its  global  minimum  at  rj  +  it.  Thus 

#4(0  -p  +ff))  >  #4(0  -a))  ¥a. 

It  should  be  noted  that  Theorem  1  is  the  S*  analog  of  a  result  of 
Sherman*®’  **.  Note  that  the  same  result  is  true  if  a  probability  density 
doesn’t  exist,  but  the  probability  measure  is  unimodal  at.  and  symmetric 
about  some  point  p.  Here  we  define  these  concepts  as  follows:  let  0  be 
a  random  variable  on  S*  and  define  the  distribution  function  F : [-7T,  71  ]  — > 
[0,1]  by 


F(a)  =  Pr(0  e  [-TT,  a])  . 


g^toegMggMijB^S&PiQS^im yixisizs&&&?ic 


Then  F  is  unimodal  at,  and  symmetric  about  0  if  it  is  convex  for 


a  e  [ -ft ,  0),  and  if 


F(a)  =  1  -  F(-a) 


at  each  continuity  point  of  F  (see  ref.  10). 

In  the  continuous  time  problem  discussed  in  Section  3  and  the  discrete 
time  problem  of  Section  4,  the  folded  normal  distribution  will  play  an 
important  role,  and  for  this  density  we  have  the  following  result  which 
shows  that  Theorem  1  holds  for  the  folded  normal  density. 

Theorem  2:  The  folded  normal  density 


F(0;r,,y)  =  Y 

V2tty 


n=-oo 


=  +  n  2  e"n  v/2cosn(0-r,) 


is  unimodal  with  mode  at  8  -  rj  and  is  symmetric  about  17. 
Proof:  Since  cos  <  1,  the  second  form  of  F  in  (8)  yields 


2 

F<0;o,y +  7  J  e"n  y^z  r  F(J7;h,Y)  - 

n-  1 

Thus  F  has  its  global  maximum  at  9  -  rj. 

Since  F(6,rj,y)  =  F{6  -  rj ;  0,  y),  we  need  only  show  that  F(0;O,y)  is 

symmetric  about  0  and  monotone  decreasing  as  p  (0,0)  increases. 

Symmetry  is  obvious  (cos  n 0  cos  n(-0)),  and  monotonicity  will  follow  if 


we  can  show 


( 0 .  0,  V)  <  0 


o  c  (0,  It) 
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Jf  (B;  0,y)  >  0  8  e  (-tt;  0)  (9b) 

We  now  remark  that  the  properties  of  F(0;  0,  y)  have  been  studied 
extensively,  since  it  is  a  theta  function.  See  refs.  12  and  13  for  dis¬ 
cussions  of  some  properties  of  theta  functions.  Using  the  notation  of  ref. 

12  ,  pp.  2,  42,  we  have 

F<M,v)  *  h 

OO 

=  k  (1  +  2q2n“1cos  9  +  q4n“2)  ,  (10) 

n=  1 

where 

-y/2 

q  =  e  l/ 

and 

“  ■  i  n a  - q2n’  ■ 

n=  1 

Using  the  fact  that  F  >  0  and  the  form  of  F  given  by  (10),  we 

have 


<M,v) 

F(0;O,y) 


It  is  easily  seen  that  the  term  in  square  brackets  on  the  right  hand  side  of 
(11)  is  positive  for  all  values  of  9  and  thus  (9)  is  correct.  ■ 

5 

Some  work  along  these  lines  has  been  done  by  Perrin  .  See  ref.  15 
for  discussions  of  other  relevant  properties  of  theta  functions,  hypergeometric 


Jj  -^»^jaW«^*WW'j?*^^w^iw",;'''ia!*>r  “ 


functions,  Legendre  polynomials,  and  Tchebycheff  polynomials. 

Note  that  the  symmetry  requirements  of  Theorem  1  are  necessary. 
For  instance,  if  <(>  is  not  symmetric,  the  mode  of  the  density  need  not  be 
the  optimal  estimate  even  if  all  the  other  assumptions  of  Theorem  1  do 
hold.  As  an  example,  consider  the  function  ^>;S*  — *  R 


♦(0)  -- 


0  <  6  <  it 


-it  <  6  <  0 


Suppose  our  distribution  is  the  folded  normal  centered  at  0.  Then  it  can  be 
shown  that  the  mode,  0,  is  not  the  optimal  estimate. 

2.  2  Optimal  Estimation  Using  Fourier  Series 

If  we  do  not  have  a  unimodal  distribution  or  symmetric  cost  criteria 
that  increases  away  from  0,  Theorem  1  doesn't  apply,  but,  with  the  aid  of 
Fourier  series,  we  can  still  do  some  useful  analysis.  We  assume  that  our 
probability  density  is  given  in  Fourier  series  form 


~  +  /  a  sin  n0  +  b  cos  nd 

2ft  L—t  n  n 


as  is  our  error  function 


=  d  i-  /  c  sin  n6  +  d  cos  nd 
Y'  o  n  n 


A  A 

Our  problem  is  to  choose  G  to  minimize  E(9(0  -9)).  A  simple 
computation  yields 


•"*»  v  "«<-■  *■  -V*.  j>  w . 


W 

<£($(0-0))  =  do  +  jr  2  {  an(cn  cos  n0  +  dR  sin  n0) 

n=  1 

A  A  \ 

+  b  (d  cos  n‘J  -  c  sin  n0)  f 
n  n  n  > 


Thus,  necessary  conditions  for  a  local  minimum  are 

.  £(t(0 -§))  =  0 


Vv  /  A  A  A  A  | 

)  <na  [d  cos  n0  -  c  sin  n01  -  nb  fd  sin  n0  +  c  cos  n01  f  =0 

Z-  (  nL  n  n  Jnln  n  1  ) 


>  o 


\  1  2  A  a  2  ^  A 

>  -n  a  [d  sin  n0  +  c  cos  n01  +  n  b  [c  sin  n0  -d  cos  n01  >  0 
Z-t  nln  n  J  nln  n  1  — 


Solutions  of  (13)  and  (14)  are  candidates  for  the  optimal  estimate. 

Explicit  solution  of  (13)  and  (14)  is  possible  only  for  certain  error 
functions.  For  example,  suppose  we  consider  the  function 

4>j{0)  =  1  -  cos  0 

Then  d  =  1,  d,  =  -1,  and  all  other  Fourier  coefficients  are  0.  Then 
o  7  i  7 

(4» j (0  -0))  =  1  -  7l(aj  sin  0  +  bj  cos  6)  ( 


and  equations  (13)  and  (14)  become 


aj  cos  0  -  bj  sin  0  =  0 


(16) 
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a^  sin  0  +  bj  cos  0  >  0 


(17) 


I  A  A 

If  a^  =  bj  =  0,  ^(9^(0  -9))  is  independent  of  6‘.  In  any  other  case,  there 
are  two  inequivalent  solutions  to  (16),  where  two  solutions  are  considered 
equivalent  if  they  differ  by  a  multiple  of  2n.  The  two  solutions  are 


9  -  tan'^aj/bj) ,  tan-1(aj/bj)  +  n 

where  tan~*:[-oo,  co]  — >  [-ir/2,  n/2],  Examination  of  (15)  and  (17)  yields  a 


method  for  choosing  the  proper  solution: 


al  >  0, 

bl 

>  0 

— * 

choose  solution  in  first  quadrant 

al  > 

bl 

<  0 

choose  solution  in  second  quadrant 

a!  < 

bl 

<  0 

choose  solution  in  third  quadrant 

aj  <  0, 

bl 

>  0 

choose  solution  in  fourth  quadrant. 

Witn  .  a  choices,  it  is  easy  to  see  that 


A 

sin  9 , 


+  b. 


cos  9. 


\ 


and 


£(k(e  -  ^q))  =  1  -  7T  /  aj  +  bj  , 


(18) 


where  Qq  is  the  optimal  value. 

Thus,  in  this  case,  we  can  explicitly  solve  the  estimation  problem  in 
terms  of  the  first  mode  Fourier  coefficients.  Note  that  the  higher  modes 
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play  no  role  in  this  particular  case,  but  also  note  that  ^  has  some 
motivation  from  standard  vector  space  theory,  in  that  for  small  values  of 
0, 

4>j(5)  =  1  -  cos  0  *S  J  6Z 

Another  possible  error  function,  one  that  involves  the  first  and 
second  modes  of  the  density,  is 

<}>2(0)  -  (1  -  cos  0)^  =  -  2  cos  9  +  cos  2 9  . 

Using  the  same  type  of  approach  as  before,  one  can  reduce  the  problem  of 

finding  the  optimal  estimate  to  the  solution  of  a  auirtic  polynomial  equation 

and  the  calculation  of  several  functions  --  a  procedure  that  can  be  done 

easily  by  computer.  However,  the  complexity,  even  when  we  just  add  in 

the  second  mode,  is  such  that  no  closed  form  for  the  optimal  error  in  terms 

of  the  Fourier  coefficients  is  available. 

As  can  be  seen,  the  error  analysis  becomes  increasingly  more 

difficult  as  the  number  of  nonzero  Fourier  coefficients  increases.  For 

2 

example,  direct  application  of  these  ideas  if  ^>  =  p  or  p  ,  where  p  is 

the  Riemannian  metric  on  S*  (actually  p  (0)  =  p(0,f )),  yields  extremely 

complicated  equations.  However,  the  behavior  of  the  Fourier 

n 

coefficients  for  these  two  examples  suggests  trimeating  the  series  and 
applying  techniques  such  as  those  used  in  the  analysis  for  (1  -  cos  9)  and 
(1  -  cos  0)^. 

However,  for  these  special  functions  we  can  use  a  different  method 

2 

in  trying  to  find  the  optimal  estimate.  Consider  the  function  p  .  We 
have  the  equation 


3 

j 

% 


% 


-£ 

1 

■i 


3 

I 


I 


23N 


p2(0)  =  6Z 


-77  <  9  <  77 


Thus,  if  our  probability  density  is  p (0), 


S(p  Z(9  -0))  --  "f  \  (0-0)%(0)d0 
-n+9 


Using  Leibnitz's  rule  and  the  periodicity  of  p,  we  have  the  following 
necessary  conditions  for  optimality 


~  £(pZ(9-9))  =  2§  -  2  f  6  p(0)  d0  =  0 

d0  -77+0 


S(pZ(0 -9))  =  2  -  4ir  p(§  +  7T)  >  0  .  (20) 

d0^ 

Equations  (19)  and  (20)  offer  an  alternate  method  for  solving  for  0Q.  Note  that 
equation  (19)  resembles  the  necessary  condition  for  the  least  squares 
estimate  on  R*.  In  that  case 


TUU 

xQ  =  S  (x)  =  f  x  p(x)dy  , 


where  p(x)  is  the  density  function.  However,  in  this  case,  essentially 
because  of  the  topological  difference  between  S*  and  R*.  the  integral 


JT.+0 

*  9  p(0)d0 

.L  .  n 


is  not  independent  of  0,  and  thus  cannot  be  called  <f  (0). 
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2.  3  The  Folded  Normal  Density  and  Its  Linear  Combinations 

As  will  be  seen  in  the  next  two  sections,  two  types  of  probability 
densities  are  of  great  importance.  The  first  of  these  is  the  folded  normal 
density,  F(0;  r),y),  and  the  second  is  a  linear  combination  of  such 
densities 

co 

p(d)  =  J  cn  F(0;n,yu)  (21) 

n=  1 

CD 

I  cr\  =  *n>0 

n- 1 

3 

It  should  be  noted  that  it  has  been  shown  that  the  set  of  densities  given  by 
(21)  with  only  finitely  many  nonzero  cn's  is  dense  in  lA-jr, 7r),  and  this  is 
still  true  if  all  the  y^s  are  equal  to  some  fixed  y.  In  this  section  we  do 
not  require  that  only  finitely  many  cn’s  be  unequal  to  zero.  The  reason  for 
this  will  be  seen  in  Section  4. 

For  the  case  where  our  density  p(0)  is  a  single  folded  normal 
density,  F(0;  t),  y),  we  know  that  the  optimal  estimate  for  any  function  ^ 
satisfying  (7)  is  the  mode,  r? .  However,  for  this  special  density,  we  can 
say  a  great  deal  more.  Let  us  consider  a  more  general  class  of  error 
functions.  We  remove  the  symmetry  requirement  but  still  require  that  ^ 
be  increasing  on  [0, 7T ]  and  decreasing  on  [-7T,  0],  For  such  a  4>,  the  mode 
yj  need  not  be  the  optimal  estimate,  however  for  this  discussion  we  will 
take  it  as  our  estimate.  The  following  theorem  reveals  an  important 
property  of  the  error  «?(<!><e  -»>)). 

Theorem  3:  For  satisfying  the  above  requirement,  and  p(0)  =  F(6;r),y ) 


<?((j>(0  -  n))  is  an  increasing  function  of  the  variance,  y  --  that  is 

^  sfae  -*))  >  0  (22) 

Proof:  Writing 

oo 

i(0)  -•  d  +  /  c  sin  nd  +  d  cos  n 6 
r'  '  o  Zj  n  n 

n=  1 

and  using  the  results  on  Fourier  series  analysis, 

00  2 

s(^e-n))  =  d0+  J  dne"nv/2  >  (23) 

n=  1 

but  we  get  the  same  error  if  we  compute  -  rj)),  where  l//  is  the 

function  satisfying  (7)  defined  by 

> urn  =  j  (bm  +  <|>(-0))  . 

Thus,  it  is  enough  to  prove  the  theorem  for  <j>  satisfying  (7).  In  this 
case  rj  ij>  the  optimal  estimate  and 

<?(4»(0-rj))  =  /< |)(0-rj)F(0,rj,y)d0 

=  f  ^)F(e;O,y)d0 

-7T 

n 

=  2  /  ^(e)F(0;O,y)d0 

70 


Then,  (22)  will  hold  if 
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,w*z.- ks~ 


¥-r<t  r^r  A- Vw.  <•/-  ,  ^jl-* 


Suppose  we  can  show  that  there  exists  0^  e  [0, 7r]  such  that 


F{0;O,y)  <  0 


9  e[O,0o) 


h  F(do’°^>  =  0 


^  F(0;  0,  y)  >  0 


0  e  (0o,tt] 


Then,  since 


HO)  <  <i>(0o) 
Ho)  >  4»(0O) 


0  e[O,0o] 
0  e  [0Q,tr] 


we  have 


^  4><0)  F(0;  0,  ■y)d0  >  4>(0Q)  ^  J  F(0;O,y)d0 


0>  ^  <l>  =  0 


and  we  get  a  strict  inequality  if  <}>  is  not  a  constant. 
Now  it  is  easy  to  see  that 


J-  F(0;O,y)  =  \  F(0;  0,  y) 

1  30 


and  the  theorem  will  be  proved  once  we  prove  the  following  lemma,  which 
yields  more  information  about  the  shape  of  the  folded  normal  density. 
Lemma  1:  For  an  arbitrary  but  fixed  value  of  y  >  0,  there  exists 
0 o  c  [0, 7T  ]  such  that 
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/-^2F\ 

j  (0.Ojy)  >  0  ¥0  e(0,  |) 


3  (  30 
30 


7T 

These  inequalities  imply  that,  there  is  a  0n  e  (0,  -y)  such  that 

a2 

F(0q;  0,  y)  =  0.  Then  for  0Q<01<7r/2 
30 


30 


30 


2  F(0O’°’V) 


2  F^pO.V) 

F^pO/y)  >  “'F(0O;O,V) 


=  0 


or 


30 


~2  F(9p  0,  y)  >  0 


and  the  lemma  and  the  theorem  are  proved.  ■ 

Note  that  by  symmetry  we  have  that  F  has  a  unique  inflection  point  at 
-  0q  on  the  interval  [-7T,  0]  . 

Theorem  3  tells  us  that  the  intuitive  notion  that  we  "have  more 

accurate  information"  for  smaller  values  of  y  can  be  made  precise. 

Also,  this  theorem  implies  another  result,  which  is  the  S*  analog  of  a 

14 

problem  treated  by  J.  L.  Brown  .  The  problem  treated  in  ref.  14  is  that 
of  finding  the  optimal  linear  filter  minimizing  an  asymmetric  error  criteria 
on  R*  that  decreases  on  (-oo,  0]  and  increases  on  [0,oo).  The  result  is 
that  the  optimal  linear  filter  is  the  minimum  variance  filter,  and  the  proof 
essentially  consists  of  showing  that  the  error  is  an  increasing  function  of 
the  variance.  Theorem  3  clearly  implies  an  S*  analog  of  this  result. 

Some  examples  cf  cost  criteria  satisfying  (7)  and  the  associated 
optimal  costs  when  the  density  is  folded-normal  will  be  given  in  Section  3. 
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For  the  case  in  which  p(0)  is  given  by  (21),  the  situation  is 
somewhat  different  and  much  more  complicated,  since  we  no  longer  have 
a  unimodal  probability  density.  For  this  case,  we  will  examine  the 

2 

optimal  estimation  problem  for  two  error  functions,  1-cos  9  and  p  (0). 

v  A 

As  discussed  in  subsection  2.  2,  in  trying  to  minimize  E(l-cos  (9-9)) 

A 

with  respect  to  d,  we  need  only  know  the  lowest  mode  Fourier  coefficients, 
a^  and  bj.  In  this  case 


‘1  =  ?  1  c  n 


<„/2  . 

•  oi 


sm  rj. 


’1  =  ?  1  cn 


^n/2 

e  '  cos  n 


and  (assuming  a^  and  b^  are  not  both  zero)  the  optimal  estimate  9Q  is 
either  tan  *  ajA>j  or  tan  *  Sj/bj  +  depending  upon  the  signs  of  a^ 
and  bj.  In  any  case,  the  optimal  cost  is  given  by 


E(1-cos(0-0q))  -  I  -  |  1  ^  cn  6  S^n  ^n  I  + 


'  oo 

I  s,*”"' 

-n=l 


cos  17, 


2  1/2 


In  general,  this  optimal  error  is  not  an  increasing  function  of  each  of  the 
variances  y  ,  individually.  However,  if  all  of  the  variances  equal  some 
value  y,  it  is  easy  to  see  that  the  optimal  error  i£  an  increasing  function 
of  y. 
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In  the  case  of  p  (0),  we  recall  from  subsection  2.  2  that  it  was 
necessary  to  evaluate 


7T+a 

f  9  p(6)  d 6 

as  a  function  of  a.  For  p  a  folded  normal  density,  F(0;  rj,  y),  we  have 


( 2k+ 1  )tt 


j  9  p(0)d0  =  r?  -  ^  2kjr  /  N(0;  r) -a,  y)d9  (25) 

~n+a  k=-co  (2k-l)ff 


where  N  is  the  normal  density.  The  second  term  on  the  right-hand  side 
of  (25)  involves  various  values  of  the  error  function,  erf,  and  can  be 
tabulated  as  a  function  of  17 -a  and  y.  Then,  if  we  call  this  term 
g(rj  -a,  y),  in  the  case  where  p(0)  is  given  by  (21),  necessary  conditions 
for  the  optimal  estimate  are 


*-2"  1  cn  F(%  ^  ° 


There  does  not  appear  to  be  a  simple  formula  for  the  optimal  cost,  nor 

is  it  clear  whether  or  not  the  optimal  cost  is  a  monotone  increasing 

functions  of  the  y  or  of  y,  in  the  case  where  all  of  the  y  =  y. 

>n  *>  «n  » 


mi . .  r~'~ 


3.  Continuous  Time  Estimation 

A  signal  process  and  an  observation  process,  taking  values  on  s\ 
will  be  formulated  in  terms  of  bilinear  Ito  matrix  differential  equations. 

The  conditional  probability  distribution  of  the  signal,  given  observations 
over  a  certain  period  of  time, will  be  evaluated.  Recursive  computational 
schemes  for  optimal  estimation  (filtering,  smoothing,  and  prediction),  with 
respect  to  the  error  criteria  defined  in  the  previous  section,  will  be 
derived.  In  fact  it  will  be  shown  that  optimal  estimates  on  S*  can  be 
obtained  recursively  by  the  use  of  an  ordinary  vector  space  estimator 
together  with  a  nonlinear  preprocessor  and  a  nonlinear  postprocessor,  as 
illustrated  in  Fig.  4.  Multichannel  estimation  on  abelian  Lie  groups  will 
be  examined.  Examples  illustrating  the  optimal  estimation  procedure  are 
given  at  the  end  of  this  section. 

The  circle  group,  S*,  can  be  identified  as  the  multiplication  group 
of  2x2  orthogonal  matrices  of  determinant  +1.  Any  element  of  this 
group  has  the  form 


cos  9  sin  9 
-sin  (?  cos  9 


and,  for  9  near  zero,  we  have  the  first  order  approximation 


cos  0  sin  9 
-sin  9  cos  9 


10  0  1 

+  0 

0  1  -10 


The  matrix 


0  1 


-1  0 


-■^■rfr  iS^rfy.  sv^vlNMji? ft*)*.  ■ 


<»>fnw tys? 


■^fVfigj;^  «.  _r  _  _ _ 


is  called  the  infinitesimal  rotation,  and  we  have 


cos  9  sin  8 
-sin  9  cos  9 


=  exp  RO 


For  those  familiar  with  the  theory  of  Lie  groups,-  S1  is  a  one 
dimensional  abelian  Lie  group,  with  the  2x2  orthogonal  matrices  a 
representation  of  the  group.  The  infinitesimal  rotation  R  forms  a  basis 
for  the  Lie  algebra,  L(S  ),  of  S^.  The  Lie  algebra  and  Lie  group  are 
related  by  the  exponential  map 


co 

V  An  1 

exp  (A)  ,  2  A  e  L(SA) 


and  the  logarithm  map 


log  (B)  =  ^  (-l)11"1  -^~I)  B  e  S1,  I B -I [ <  1 


3.  1  Signal  Processes  and  Observation  Processes 

It  has  been  shown  [21,  p.  269]  that  the  circular  Brownian  motion  on 
S1  can  be  constructed  by  taking  the  projection  modulo  2jr  of  the  standard 
1 -dimensional  Brownian  motion  onto  the  unit  circle  S1.  This  method  will 
now  be  used  to  construct  a  contir.-.ous  signal  process  on  S*  and  to 
formulate  the  mathematical  model  of  a  sensor  (an  observation  process) 
to  be  used  in  this  report. 

We  will  adopt  the  following  notation 
P)  =  a  probability  space 


s 


a  positive  real  number 


=  the  family  of  real-valued  continuous  functions,  a,  on 
[0,  s]  such  that  a(0)  =  0 
=  the  Borel  c-field  of 

=  the  family  of  2  x  2  orthogonal-matrix-valued  continuous 
functions,  A,  on  [0,  s]  such  that  A(0)  =  I,  the  identity 
matrix 

=  the  Borel  c-field  of  C? 


g 

Lower  case  letters  denote  elements  in  Cj  and  upper  case  letters  denote 

g 

elements  in  . 

Let  J:C® — »  C®  be  defined  by 


(J(a))(t)  =  exp(a(t)R)  = 


cos  a(t)  sin  a(t) 
-sin  a(t)  cos  a(t) 


S  s 

for  a  cCj  and  t  e  [0,  sj.  It  is  easily  seen  that  J  is  38^ -measurable 

and  bijective.  This  bijective  operator  will  play  a  key  role  in  this  section. 

Intuitively,  J  can  best  be  illustrated  by  Fig.  3.  A  point  on  the  unit  circle, 

s\  can  be  represented  by  either  the  angle  6  e  [-7T,7T)  it  makes  with  a 

fixed  radial  axis  or  the  2x2  orthogonal  matrix  exp(R0).  Therefore,  in 

s 

the  first  representation,  is  the  family  of  piecewise  continuous  functions 
0(t),  such  that  at  any  point  of  discontinuity  the  right  hand  limit  cf 
9  is  +  7T,  while  the  left-hand  limit  is  +  v  (see  Fig.  3). 

Each  continuous  curve  a(t)  on  R*  gives  rise  to  one  and  only  one 


piecewise  continuous  curve  0(t)  lying  between  it  and  -n,  of  which  the 
continuous  segments  are  obtained  by  translating  the  corresponding 
segments  of  a(t)  an  integral  number  of  multiples  of  2 it  (see  Fig.  3). 


y 


Conversely,  each  piecewise  continuous  curve  in  C2  gives  rise  to  one  and 
only  one  continuous  curve  taking  values  on  R*  which  is  obtained  simpl/  by 
piecing  the  continuous  segments  together.  This  intuitive  observation 
illustrates  the  bijective  property  of  the  operator  J.  Thus  a  continuous 
random  signal  process  on  S*  which  is  described  by  an  .^/-measurable 

g 

function  X:S2 — *  corresponds  to  a  continuous  random  signal  process 

1  s 

on  R  which  is  described  by  an  v«/-measurable  function  x:S2 — ►  Cj  such 

that 


X(t)  =  (J(x))(t)  ,  te[0,s]  . 


(29) 
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where  m:Rj  x  Rj  — *Rj  is  Borel-measurable,  q:Rj  — >Rj  is  positive 
and  measurable  and  w  is  the  standard  Brownian  motion  on  (J2,«V,  P), in¬ 
dependent  of  x.  Let  Z :S2 — be  defined  by 

Z (t)  =  (J(z))(t)  .  (31) 

Applying  the  Ito  differentiation  rule,  we  obtain  the  following  Ito  matrix 
differential  equation: 

-  m(t)  0  dw(t) 

dZ(t)  =  Z(t)  dt  +  Z(t)  (32) 

-  m(t)  L-dwW  0  - 

Z(0)  =  I, 

where  m(t)  =  m(x(t),t)  and  the  diagonal  terms  are  the  second  order 

correction  terms  which  keep  Z  on  the  circle.  This  equation  is  the 
mathematical  model  of  the  sensor  to  be  used.  We  note  that  the  input, 
x(t)  to  the  sensor  is  not  the  dynamical  state  X(t)  of  the  rotational  signal 
process  on  the  circle,  but  rather  the  angle  the  rotational  process  has 
swept. 

The  physical  motivation  for  this  sensor  model  comes  from  the  fact 
that  in  observing  a  rotational  process  (for  instance  a  gyroscope  recording 
rotation  about  a  fixed  axis)  our  measurement  contains  information  on  the 
total  rotation,  x(t),  not  just  the  orientation,  X(t).  In  some  applications, 
such  as  the  gyro  problem  mentioned  above,  we  wish  to  extract  knowledge 
of  orientation  from  knowledge  of  rotation,  so  it  is  proper  to  regard  X(t) 
as  the  signal  process.  However,  in  other  applications,  such  as  FM  de¬ 
modulation,  our  interest  centers  on  the  x  process,  and  in  these  cases,  we 


f«,  «***w*«**»--~<^^ 


may  regard  x  as  the  signal. 


3.  2  Conditional  Probability  Distributions 

In  this  subsection,  we  will  derive  equations  for  the  conditional 

probability  distribution  of  the  signal  process  given  observations  over  some 

time  period.  The  approach  of  this  section  is  measure-theoretic  in  nature, 

and  the  major  results  are  summarized  in  the  statements  of  Lemma  2, 

Theorem  4,  and  its  two  corollaries. 

Let  us  denote  |z(t  ),  t  e  [0,  t](  and  {Z(t),  T  e  [0,t][  by  z* 

and  Z*,  respectively.  We  note  that  Z*  =  J(z*).  Since  J  is  bijective  from 

C*  to  C^.  the  <r- subfield  of  o/  generated  by  z *  is  the  same  as  that 

generated  by  Z*.  In  other  words,  the  information  carried  by  z*  and  Z* 

is  the  same.  That  or-subfield  will  be  denoted  by  o/*  .  The  o--subfield  of 

z 

o/  which  is  generated  by  =  X(\)  ( the  subscripts  \,  s,t  denote  that  the 
processes  are  evaluated  at  these  times. )  will  be  denoted  by 

Let  P  be  the  conditional  probability  measure  on  (£2,o/  )  given 
o/*,  defined  by  P  (A,u7)  =  P(A|o/* )(w7).  for  A  eo/  ,  u,  eft.  Let  P 

Z  XZ  u  2  Ct  X  4  ZX 

be  the  conditional  probability  measure  on  (£2,0/*)  given  o/  ,  defined  by 

z  X 

P  (BjU.)  =  P(B|o/  )(u,),  for  B  e«W*,  u,  e  £2.  The  restrictions  of  P  to 

ZX  1  X  x  Z  X 

d"  and  o/  are  denoted  by  P  and  P  ,  respectively.  Let  p  and  p 

Z  X  z  X  z  w 

t  t  t  t 

be  the  induced  measures  on  (Cj,^?j)  by  z  and  w  ,  respectively.  Define 
the  conditional  measure  p  on  (C^j ,  33^ ),  given  X.  ,  by  p  (B,u.)  = 
P(z~1(B)!o/x)(u1),  for  B  *m\,  Uj  e  £1 

It  is  known  (ref.  15)  that  pzx  pw  f*o  p^  where  denotes 
equivalence  of  measures,  and 


iv<^i~  *a  v>->y-*v^  -**■ 
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if  <«‘A>  =  ^llxx'\) 


d,l2  t 

dT  « > 

'  XXf 


*l®‘) 


where  <?x  means  taking  the  average  over  x  and 


h 


.  t  2 

0  -  exp  (  -  2  f  (r)dr  + 


where  J  denotes  an  Ito  integral. 
Hence 


dP  dp 

-a#2  (“2-“i>  =  -dir  (z  V-  ^‘“i” 

z  rz 


(t)  d?  (T)) 


yixx  =  \(u,)) 

k.  A.  AX  /  ?  <  ' 


8  (01) 
x 


where 


t  2 


Q1  -  exp(--^  £  (r)dT  +  £  ^-(t)  dz(r  .Wj)) 


7X 

We  note  that  ~gp~  («£,  Wj)  is  x  ,VX* measurable.  Applying  a  general 

x  “  A 

Bayes  rule  from  ref.  16,  we  obtain 


dP  dP 

"dP-  (U1}U2)  =  ‘dP- ^’“l* 
x  z 


Let  us  denote  the  family  of  2x2  orthogonal  matrices  by  Mq.  The 

set  of  induced  Borel  sets  is  denoted  by  Let  v  be  the  conditional 

’  0  xz 

measure  on  given  defined  by  vxz(A,  u^)  -  P(X^  (A)|,</ )(u>.,). 


.'- v—yw  Ay 


p**ft  t-e.  -«-»  — 
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for  A  e  BSq,  u2  e  S2.  Let  vx  be  the  measure  on  (M q,<5?q)  induced  by  X^. 
Then  it  is  easily  seen  that 


sir  t3W>  z  «“2»  =  = 


^(9‘IX,  =  Xxto1,> 


^(«*) 


where  9  is  defined  by  (37).  Summarizing  what  has  been  shown,  we  have 
the  following  lemma. 

Lemma  2:  Consider  the  observation  process  described  by  (32).  The 
conditional  probability  measure  for  the  signal  X^  given  the  observation  Z*, 
v  ,  is  then  absolutely  continuous  with  respect  to  v  ,  the  a  priori  measure 
for  X^,and,  for  Z*  €  C*  and  X  e  M^, 


d^  {X’  Z  >  = 
x 


<?  (S^X,  =  X) 


<yn 


where 


,  t  2 

{"  1  J0  If  (T)dT  +  ^  (T)  [Z'(T)dZ(r)]12) 


[Z’(T)dZ(T)]12  =  [1,0]  Z’(T)dZ(T) 


If  the  density  function  of  v  exists  and  is  denoted  by  p  (•),  then  it 

X  t 

follows  from  Lemma  1  that  the  density  function  p  (  [Z  )  of  v  exists  and 

SX 

and  can  be  expressed  as  follows: 


Px  (X[Zfc) 

~x 
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where  0*  is  defined  by  (41).  Let  x  e  be  defined  by  exp  Rx  =  X  and 


•TT  <  x  <  n.  Then  by  simple  calculations, 


(^[x.r  x  +  2k:r,  k=l,  2, .  .  .  )  p  (X) 

p  (xlzS  = -  ^ 


X\ 


m. 


i 


I 


™»K 

i 

Bs 


1 


co  8  (0t|x(V)  =  x+2k7r)  p  (x+ 2k7T) 

I  — 


k=-oo 


8  (0l) 

x 


(44) 


where  p  denotes  the  density  function  of  x(\)  .  This  completes  the 
XV 

proof  of  the  following  theorem. 

Theorem  4:  Consider  the  observation  process  described  by  (32).  If  the 

density  function  p  of  X(\)  exists,  then  the  conditional  density  function 
t 

p  (•  j  Z  )  exists  and  can  be  expressed  as  follows 


x\ 


oo  oo  <fx(0t|x(\)-x+2k7r)px  (x+2krr) 

px  (X  |  Z1)  =  J  px  (x+2k7rjZt)  =  £  X 


k=  -oo 


k=  -oo 


8  (0*) 


x 


(45) 


where  6  is  defined  by  (41),  p  denotes  the  density  function  of  x(\)  and 

x\ 

x  is  defined  by  exp  Rx  =  X  and  -n  <  x  <  n. 

It  is  appropriate  to  remark  that  one  can  easily  derive  the  stochastic 

partial  differential  equation  for  the  conditional  density  p  (XjZ*)  using 

~X  t 

Theorem  4  and  the  well-known  equation  (refs.  19,  20)  for  p  (x42kir|Z  ). 

X\ 

-oo  <  k  <  oo.  For  economy  of  space,  this  equation  will  not  be  displayed. 
However  we  remark  that  when  m(x,  t)  is  periodic  in  x  with  period  2jt,  the 
equation  is  in  a  form  similar  to  the  Stratonovich-Kushner  equation  with 


I 


Bwrtiv^vv--> 
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p  replaced  by  p  . 
*X  ~X 


Using  Theorem  4  and  the  well-known  fact  (refs.  17, 18)  that  the 


smoothed  and  the  predicted  densities  can  be  expressed  explicitly  in  terms 


of  filtering,  we  can  easily  obtain  the  following  two  corollaries. 


Corollary  It  The  conditional  smoothed  density  px  (X|’Z  ),  for  t^  <  X  <  t, 

~\ 

may  be  expressed  in  terms  of  the  conditional  filtered  density  as  follows: 


PSx«X|Z‘)=  I  P^ta|zSeXp(|  ^jd.) 


k=-oo 


where  x  is  defined  by  exp  Rx  =  X  and  -it  <  x  <  rr  and 


dlg  =  [Z'(s)dZ(s)]12  -  m(s)ds 


rn(s  jx^  =  x)  -  m(s) 


m(s)  =  <?(m(s)  [  Zs) 


m(sjxx=x)  =  ZS,  x^  =  x)  . 


Corollary  2:  Let  X  be  a  Markov  process  with  given  transition  density 


p  (X|x(t)=  £)•  The  conditional  predicted  density  p  (Xl.Z*),  for  tn  <  t  <  X 
^X  ~X  u 

may  be  expressed  in  terms  of  the  conditional  filtered  density  as  follows: 


.  +oo 

px  (X | Z)  =  f  px  (X|x(t)  -  5)p  (?|Z  )d? 

~X  -oo  ~X  t 


-  ^sew&Z^  &&jx 
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3.  3  Optimal  Estimation 

In  the  previous  subsection,  the  conditional  probability  distributions 

were  studied.  A  variety  of  estimation  problems  may  be  studied  based  on 

those  conditional  distributions.  However,  some  estimation  problems  on  the 

circle  can  be  directly  solved  by  using  results  in  vector-space  estimation 

theory.  In  this  subsection,  the  well-established  linear  optimal  estimation 

theory  will  be  used  to  deduce  recursive  equations  for  optimal  estimation  on 

S*  and  thereby  illustrate  the  approach. 

The  estimation  problem  which  we  will  mainly  be  concerned  with  in 

this  subsection  is  that  of  constructing  a  2x2  orthogonal  random  matrix 
a  .  *  t 

X(\  |t)  as  a  -measurable  functional  of  Z  such  that  for  a  symmetric 

cost  function  <j>  defined  by  eq.  (7), the  following  inequality  holds  for  all 

-measurable  2x2  orthogonal  random  matrices  M: 
z 

<^(^(X(X),  X(X  1 1)) !  z1)  <  ^(Xp^MlIZ4)  .  (52) 

in  which  flX^X^)  ^  <|>(0),  &  being  defined  by  exp  -  X^X^  and 
-7T  <  Q  <  it  (i.  e.  d  is  the  angle  between  X^  and  X^). 

We  have  seen,  at  the  beginning  of  this  section,  that  a  continuous 
random  process  X  on  S*  can  be  identified  with  a  continuous  random  process 
x  on  R*  via  the  bijective  mapping  X  =  J(x).  We  now  construct  a  signal 
process  X  on  S*  by  injecting  a  linear  diffusion  x  into  ,  x  satisfying 

1  /2 

dx(t)  -  a(t)x(t)  dt  +  b  '  (t)  dv(t).  x(C)  =  0  (53) 

where  b(t)  >0,  Vt  c  T,  and  v  is  a  standard  Brownian  motion,  independent 
of  w,  the  observational  noise.  Applying  the  stochastic  differentiation  rule, 
we  obtain  the  following  stochastic  differential  equation  for  our  signal 


vyyj** ■wsmr^ ;»%« *-jr&7 . -^r >•  -  %«*•  -  Spry** -*  *>-.*«:>•?  v*t& Q9&3 


process  X  =  J(x): 

t,  t  \  /  *y 

dX(t)  =  -  \  b(t)X(t)dt  +  X(t)R{a(t)[  f  (exp  f  a(<r)dT)b  7  (s)dv(s)]dt 
z  J0  s 

+  b1/2(t)dv(t)}  (54) 


X(0)  =  I 

where  we  note  that  x(t) 


we  note  that  x(t)  =  ^(exp  J  a.(r)dT)b  (s)dv(s). 

The  observation  process  to  be  used  in  this  subsection  is  taken  to  be  Z, 


satisfying  the  stochastic  differential  equation: 


r 

■  2 

c(t)x(t) 

dZ(t)  =  Z(t) 

_-c(t)x(t) 

q(t) 
■  2 

Z(0)  =  I 

dt  +  Z(t) 


0  dw(t) 

-dw(t)  0 


As  shown  in  subsection  3.  2. ,  Z  can  be  identified  with  z  =  J  (Z)  satisfying 

dz(t)  =  c(t)x(t)dt  +  q*/2(t)  dw(t)  (56) 

z(0)  =  0 

Note  that  the  equations  for  X  and  Z  are  both  bilinear  in  form.  Moreover, 
z*  and  Z*  generate  the  same  tr- subfield  in  (Sl,j4,  P).  Hence 

<£(x(\)  1**^)  is  both  a  -measurable  functional  fj  of  z*  and  a  ot?2~measura^e 
functional  of  Z^,  and 

f2(Zl)  =  f1(j"1(Zt))  .  (57) 

Let  x^jt  and  x(X[t)  denote  fj(zt)  =  ^(xtXJlz*)  and  =  ^(xfXjjZ1) 

respectively. 


saafST; 


1 
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& 


We  will  first  study  the  filtering  problem,  where  <r  =  t. 
Kalman-Bucy  linear  filtering  theory  yields  immediately 

dxtjt  =  a(t)xt|tdt  +  K(t)c(t)q_1(t)  (dz(t)  -  c(t)xt|tdt) 


K(t)  2  a(t)K(t)  -  c2(t)q_1(t)K2(t)  +  b(t) 
K(0)  ^  0 


Then  the 


(58) 


(59) 


In  view  of  (57),  we  obtain  the  following  lemma,  which  not  only  leads  to  the 
solution  of  the  above  stated  filtering  problem  but  also  applies  directly  to 
optimal  frequency  demodulation  (see  Section  5). 

Lemma  3:  Let  the  stochastic  process  (54)  be  the  signal  process  and  the 
stochastic  process  (55)  be  the  observation  process.  Then  the  filtering 
equations  are 

dx(t|t)  =  a(t)x(t|t)dt  +  K(t)c(t)q-1(t)  ([Z'(t)dZ(t) ]n  -  c(t)x(t| t)dt) 

(60) 

x(o  |  o)  ?  o 

K(t)  =  2  a(t)K(t)  -  c2(t)q_1(t)K2(t)  +  b(t)  (61) 

K(0)  -  0 

and  the  conditional  probability  density  is  given  by 

%(xiz'>  “  T===  [-  irKTtl  -  ^<tlt))z3  .  (62) 

In  view  of  Theorem  4,  we  see  that  p  (XjZ1)  is  a  folded  normal 

~t  .  t 

density.  By  Theorem  2,  it  follows  that  p  (XfZ  )  is  unimodal  with  mode 

‘X 

^'t 

at  exp  (x(tjt)Rj  and  is  symmetric  about  it.  We  may  now  conclude  from 


! 
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Theorem  1  that  for  a  cost  function  defined  by  (7), 


SmX(t),  exp[x(t  ( t)R] )  I Z1)  <  &($(X(t),  M)|Zfc) 


for  any  ^/_-measurable  2  x  2-dimensional  orthogonal  random  matrix  M. 
Since  exp[x(t[t)R]  is  easily  seen  to  be  a  measurable  functional  of  Zfc , 
it  follows  that  the  optimal  estimate  of  our  signal  process  is 

X(tjt)  =  exp  [x(t|t)R]  .  (64) 

Differentiating  this  with  respect  to  t  yields 

dX(t|t)  =  -4K2(t)c2(t)q‘1(t)X(tlt)dt+X(t[t)R((a(t)  -  K(t)c2(t)q_I(t)) 


|(exp  |  (a(r)  -  K(T)c2{T)q'1(T))dT)K(s)c(s)q‘1(s)[Z>(t)dZ(t)]12  dt 


+  K(t)c(t)q~l(t)[Z'(t)dZ(t)]12) 


Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 

Theorem  5:  If  the  signal  process  X  and  the  observation  process  Z  on 

S*  satisfy  the  following  stochastic  differential  equations: 

“ 

t  t 

dX(t)  -  J  b(t)  X(t)  dt  +  X(t)R(a(t)  ^  (exp  jf  a(T)dr)b1/2(s)dv(s)  di 


+  bJ/2 


(t-)dv(t)) 


X(0)  =  I 


dZ(t)  =  Z(t) 


_  3ill 
2 


-c(t)  f  X>(s)dX(s)  12 


V  1 

C[l)  ^  X'(s)dX(s)  12 


2 


Ut  » 


where  w  and  v  are  independent  standard  Brownian  motions  on  R  ,  then 
the  optimal  estimate  X(t[t)  in  the  sense  of  (52)  satisfies  the  following 
stochastic  differential  equations: 

dX(t|t)  -■  -  y  K2(t)c2(t)q‘1(t)X(tft)dt  + 


t  t 

X  (t [ t)R((a(t)  -  K(t)c2(t)q-1(t))  [f(exp  J(a(T)  -  K(r)c2(t)q_1(T))dT 
•KtsJcfsJq-^sJtZ'tsJdZts)]^]  dt  +  KWcWq'^ttfZ'WdZft)^) 


K(t)  =  2a(t)K(t)  -  c2(t)q_1(t)K2(t)  +  b(t) 


K(0)  =  0 


The  conditional  probability  density  is  given  by 


p-  <x|zt)  =  ,’rsf,  I  e*p[-2^u+2><*-s(tit))2] 

k=-00 


2  2 

^  ^  exp  ]  cos  k(x  -  x(t[t)) 


where  x  is  defined  by  exp  R  x  -  X  and  -  v  <  •  jr. 

The  expected  error  «f($(X(t),  X(t  [ t)))  of  the  optimal  estimate  X(tjt) 
can  be  obtained  by  straightforward  computation  with  the  aid  of  (70).  Some 


-  •*>*%  «  ?T-;*.'<v-^*  _  r,/3  *^T ; 


examples  are  given  in  the  following  corollary. 

Corollary;  Let 9  be  defined  by  exp  R0  =  X  and  -  it  <0  <  It.  Then 

(i)  for  <|>(0 )  =  1  -  cos  0  , 


<?($(X(t),X(t[t)))  =  1  -  exp  (-  j  K(t)) 


(ii)  for  ^>(0)  =  (1  -  cos  0)  , 


<?(*(X(t),X(t(t)))  =  |  -  2  exp  (-  j  exp  (-2K(t)) 


(iii)  for  <}»(0)  =  p(0),  the  Riemannian  metric, 


«?($(X(t),x(tIt)))  =  X  — exp  [-  (— 

2  *  £  (2k+l)2  [  2  J 


(iv)  for  <|>(0)  =  p  (0)  , 


/v  7 2  r  ?  i  nk+i  / 

£(*(X(t),X(t|t)))  =  3-  -4  2,  ^ -  exP(" 

L  k=  1  k  ' 


k2K(t) 


We  recall  that  K(t)  =  ^(xft)  ~  ^t[t)  -  From  this  Corollary,  it  can  be 

seen  that  for  the  cases  (i)  ~  (iv),  <£($(X(t),X(t|t)))  is  a  monotone  increasing 

2 

function  of  <£(x(t)  -  ^jt)  .  It  has  been  shown  in  Section  2  that  this  property 
holds  for  all  <J>  defined  by  (7). 


We  note  that  the  optimal  filtering  equations  (68)  and  (69)  are  complex 


in  form.  The  concept  of  the  filtering  procedure,  however,  is  quite  simple, 


and  is  best  illustrated  by  the  block  diagram  of  Fig.  4. 


The  observation  process  dZ  first  goes  through  a  nonlinear  transformer. 


The  transformed  process  [Z’dZ]^  then  goes  through  a  Kalman-Bucy 
linear  filter.  Then  we  inject  the  filtered  process  x(t|t)  into  S*  via  the 


Nonlinear  Preprocessing 


1 


injection  mapping  J.  The  output  X(t|t)  of  the  nonlinear  injector  is  the 


desired  estimate. 


The  same  approach  can  be  used  to  solve  tb.e  smoothmg  and 
prediction  problems.  The  solution  to  the  prediction  problem  is  trivial  and 
hence  omitted  here.  For  the  smoothing  problem,  v/e  first  recall  (ref.  22) 


that  for  0  <\  <t, 


\jt=  \jX.  +  K^)  (e:rp  ^(a(-r)  “  KMc^Tta'^TWdTjctsJq  1(s)(dz(s) 


c(s)xs|sds) 


By  (57),  it  follows  that 


^(X  It)  =  x(X  l X. )  +  K(X)  ^  (exp  jT(a(T)-  K(T)c2(7)q_1(T))dT 

•  c(s)q~1(s)([Z,(s)dZ(s)]12  -  c(s)x(s|s)ds)  .  (76) 

We  note  that  the  conditional  probability  distribution  of  x(X)  given 

7}  is  Gaussian.  From  Theorem  4,  it  follows  that  px  (X|Z  )  is  a 

— X 

folded-Gaussian  density  and  hence  unimodal.  As  in  the  filtering  case, 

£(X|t)  =  exp  (x(X  |t)R)  . 

Substituting  (76)  into  (77)  thus  yields 

X(X  [ t)  =  X(X{X)  exp  |rK(X)  ^(e:cp  ^(a(r)  -  K(r)c2(T)q  J(r)dT) 

•c(s)q'1(s)^[Z>(s)dZ(s)]12  -  c(s)  ^  [X'(T[T)dX(T[r)]12  ds^ 


where  we  have  used  the  identity  x(s|s)  =  J  [X'(T[r)dX(T|T)]j2  • 
Summarizing  what  has  been  shown,  we  obtain  the  following  theorem. 
Theorem  6:  If  the  signal  process  and  the  observation  process  are  the 
same  as  in  Theorem  5,  then  the  optimal  estimate,  X(\  1 1) ,  0  <  X.  <  t,  in 
the  sense  of  (52),  is  given  by 


X(\|t)  =  X(\|\)  exp  (RK(\) 


(a(T)  -  K(t)c  (r)q'  (r))dT  ) 


Ns)^! 


•  c(s)q  (s)l  [Z'(s)dZ(s)]12  -  c(s) 


|[X'(r|r)dX(T|T  »12  ds)  !  - 


A 

where  X(r|7),  K(t)  can  be  obtained  from  (68)  and  (69). 

«- 

The  conditional  probability  density  of  X(\)  given  Z",  the  expected 
errors  <?($(X(\),  X(\  [t))}  the  stochastic  equations  for  X(x|t)  for  fixed- 
point  smoothing,  fixed-lag  smoothing,  and  fixed  interval  smoothing  can  all 
be  easily  obtained  by  straightforward  computations.  They  are  left  to  the 


interested  readers. 


3.  4  Random  Initial  State 

In  the  previous  subsections,  the  initial  state  of  the  signal  piocess  X 
is  assumed  to  be  X(0)  I,  the  identity  matrix.  This  is  obviously  not  a 
practical  assumption  in  some  applications.  In  this  subsection  we  wall 
consider  the  case  in  which  the  initial  state  is  a  random  variable.  We  will 
denote  the  signal  process  by  Y  in  this  subsection,  and  assume  that 
Y(0)  Yq  is  a  random  variable  independent  of  the  observational  noise  w. 

We  observe  that  the  input  to  the  observation  process  (32)  at  time  t 
is  not  the  dynamical  state  of  the  signal.  It  is  the  angle  that  the  rotational 


process  represented  by  the  signal  has  swept  over  the  time  interval  [0,t]. 
Taking  this  viewpoint,  our  present  problem  can  be  solved  with  some 
modification  to  the  previous  results. 

Let  y(t)  denote  the  angle  that  the  signal  Y  has  swept  over  [0,t].  It  is 


easily  seen  that 

t 

y(t)  =  [  J  Y*(s)dY(s)]12  .  (80) 

Define  a  rotational  process  X  by 

X(t)  =  Y^YJt)  .  (81) 

Then  X(0)  =  I  and,  as  before,  we  may  define 

- 1  £ 

x(t)  =  (J  *(X))(t)  =  [  ^  X'(s)dX(s)]12  .  (82) 

We  note  that  x(t)  =  y(t).  In  other  words,  the  angles  swept  by  X  and  by 

Y  over  [0,t]  are  the  same.  Hence  (32)  can  also  be  used  as  the 

observation  process  for  our  present  problem.  The  conditional  distribution 

of  X(\)  given  observation  Zfc  of  the  form  given  in  (32)  can  be  determined 

by  the  application  of  the  previous  results. 

We  note  that  Yq  and  X(\)  are  conditionally  independent  given  Z*. 

If  the  distribution  of  Yq  and  the  conditional  distribution  of  X(\)  given 

Z*  are  both  folded  normal,  then  the  following  lemma  easily  leads  to  the 

/v  .  t 

conclusion  that  the  optimal  estimate  Y(X.[t)  of  Y(\)  given  Z  is  equal  to 
YgX(\|t),  where  Y^  is  the  mode  of  the  distribution  of  Y^  and  X(\|t)  is 
the  mode  of  the  conditional  distribution  of  X(\)  given  Z*. 

Lemma  4;  Let  A  and  B  be  two  independer.  2x2  orthogonal  random 


«'S®WS®03S®S?B 


A  A 

matrices  which  have  folded  normal  distributions  with  modes  A  and  B 
respectively.  Then  AB  is  a  2x2  orthogonal  random  matrix  which  has  a 

/V  A 

folded  normal  distribution  with  mode  equal  to  AB. 

Proof.  It  is  easily  seen  that  there  exist  unique  real-valued  normal  random 
variatiles  a  and  b  such  that  <fa,  Sb  e  [-7T,  7r),  A  -  exp  Ra',  and  B  --  exp  Rb. 

Then  AB  =  exp  R(a+b).  Obviously  a+b  is  a  normal  random  variable.  Hence 
AB  is  folded  normal  and  the  mode  of  AB  is  expfR^a+b)]  =  exp  [R<£(a)]. 

■ 

exp[R<?(b)]  =  A  B. 

3.  5  Multichannel  Estimation 

The  results  of  the  previous  subsections  can  be  extended  to  a  large 
class  of  problems  --  those  involving  processes  evolving  on  abelian  Lie 
groups.  It  is  well  known  (ref.  23)  that  a  given  abelian  Lie  group  G  is 
isomorphic  to  the  direct  product  of  a  number  of  copies  of  the  circle  and  a 
number  of  copies  of  the  real  line,  i.  e. 

G  R  x(S  ) 

1  m 

where  (S  )  is  usually  called  a  "torus".  The  diffusion  processes  on  this 
type  of  space  have  been  used  to  model  some  interesting  satellite  and 
pendulum  systems  in  ref.  46.  Analogous  to  (28),  a  bijective  mapping 

Jnm:(Cl)n+m -  (Cl)n  x  (C2)m  is  defined  bV 

(Jnm(a))(t)  r  {al(t)’  •••’  an(t)’  (J(an+l ))(t)’  •  *  ‘  »  (J(an+m,)(t)]  (83) 

for  a  e  (Cj)n+m,  a.  being  the  ith  component  of  a.  Thus  a  continuous  random 
signal  process  on  G  which  is  described  by  an  ^/-measurable  function 

X:Sl -  (Cj)n  x  corresponds  to  a  unique  continuous  random  signal 

process  on  Rn'trn  which  is  described  by  an  tV-measurable  function 


such  that 


x:fl- 


,.,s*m+n 

(C1) 


X(t)  = 


(j  umt) , 

nm  7 


tc  [0,  S] 


{84X 


The  mathematical  model  for  the  sensor  can  be  obtained  by  first  using 


to  inject  the  following  vector  random  differential  equation  into 
Rnx(S1)m 


dz(t)  =  m(x(t),  t)dt  +  dv(t)  (85) 

2(0)  =  0 

and  then  differentiating  Z(t)  =  (Jnm(z  ))(t)  by  the  stochastic  differentiation 
rule  to  obtain  a  set  of  stochastic  differential  equations  of  which  the  first  n 
equations  are  the  same  as  the  first  n  equations  of  (85)  and  the  last  m 
equations  are  bilinear  2x2  matrix  differential  equations  in  the  form  of 
(32).  This  calculation  is  straightforward  and  thus  we  will  not  display  those 
sensor  equations.  Because  of  the  bijective  property  of  Jjim,  it  is  clear 
that  the  estimation  analysis  in  the  previous  subsections  can  be  easily 
generalized  tc  this  general  abelian  case  with  little  modification.  For  the 
special  case  in  which  x  is  a  linear  diffusion  and  m(x(t),t)  is  a  linear 
function  of  x(t),  what  has  been  shown  simply  asserts  that  the  domain  of  the 
celebrated  Kalman-Bucy  filter  includes  estimation  on  abelian  Lie  groups. 

3. 6  Examples 

To  illustrate  the  ideas  of  the  preceding  discussions,  we  present  the 
following  examples. 

Example  1:  Consider  a  cylindrical  shaft  of  ;init  radius  being  spun  about  its 
longitudinal  sods  by  an  electric  motor.  We  assume  that  the  total  rotation 
of  the  shaft,  x.  ,  is  related  to  the  driving  force  u  by  the  differential 


with  both  Xj(0)  and  Xj(0)  equal  to  zero.  The  last  term  on  the 

left-hand  side  of  this  equation  can  be  thought  of  as  a  torsional  spring 
effect,  which  helps  to  stabilize  the  servo  loop  that  drives  the  shaft. 

The  driving  force  u  consists  of  a  known  driving  force  and  a  disturbance. 
The  known  driving  force  adds  neither  difficulty  to  the  analysis  nor 
complexity  to  the  solution.  Thus,  for  simplicity,  we  assume  that  the 
known  driving  force  is  zero  and  that  the  disturbance  is  white  Gaussian 
noise  --  i.  e.  u  =  v,  where  v  is  a  standard  one- dimensional  Brownian 


motion.  Setting  x2  =  x^  we  obtain  the  vector  stochastic  differential 
equation 


dx(t)  =  Ax(t)dt  +  Bdv(t) 


x(0)  =  0  , 


where 


■xr 

A  =- 

-  0 

r 

B  = 

-  0‘ 

-  x2  - 

.-1 

-l. 

.  1  . 

Suppose  we  wish  to  estimate  the  orientation  of  the  shaft.  The 

t 

orientation  is  determined  by  the  quantities  sin  x.(t)  sin  /x-(r)dT  and 
t  -X)  L 

cos  Xj(t)  =  cos  J  x2(r)dT.  Suppose  also  that  we  have  some  means  of 

measuring  these  quantities,  but  that  noise  corrupts  the  measurements,  so 

that  our  actual  measurements  are  z^(t)  ^  cos(  /  x2(T)dr  +  w(t))  and 

A  V 

z2(t)  =  sin(  j  x2(T)dT  +  w(t))  where  w  is  a  standard  Brownian  motion 


'■i&ipggSE  P-',.-.V.  v*-^  >X*S^f^',^7^'^;^  v^;iri7..'*J;.Jv 
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process  independent  of  v.  Using  the  Ito  differential  rule,  we 
obtain  the  sensor  equations 


dzj(t)  = 


2  Zj(t)dt  -  x2(t)z2(t)dt  -  z2(t)dw(t);  Zj(0)  =  1 


dz2(t)  -  2  z2(t)dt  +  x2(t)Zj(t)dt  +  Zj(t)dw(t);  z2(0)  0 

Using  the  results  of  this  section,  we  have  the  following  optimal 
filtering  equations 

dx(t|t)  -  Ax(t|t)dt  +  K(t)c' [yj(t)dy2(t)  -  y2(t)dyj(t)  -  cx(t|t)dt] 


x(0  j  0)  -  0 


where 


c  =  [0,1] 


and  K  is  the  2x2  solution  of 


K(t)  =  AK(t)  +  K(t)A 1  -  K(t)c'  cK(t)  +  B  B’;K(0)  =  0 
Finally,  the  optimal  estimate  of  the  orientation  --  i.  e.  the  optimal  estimate 


Xj(t)  =  exptXjitJR)  - 


cos  Xj(t)  sin  Xj(t) 


-sinxj(t)  cos  Xj(t) 


Xj(t|t)  =  exp  (Xj(t|t)R) 


The  steady  state  filter  has  the  same  form  as  the  time-varying  filter, 
but  K(t)  is  replaced  by  the  positive  definite  solution,  ,  of  the  algebraic 
Riccati  equation 


I 


If  we  formally  divide  dz^  and  dz2  by  dt  and  take  and  z 
to  be  our  measurements,  we  get  the  following  block  diagram  (Figure  5) 
for  the  signal  process,  observation  process,  nonlinear  preprocessor, 
optimal  filter,  and  nonlinear  postprocessor. 


Example  2.*  In  this  example,  the  nonlinear  signal  process  and  the  nonlinear 


observation  process  of  a  certain  system  turn  out  to  be  processes  taking 

12  11 

values  on  the  abelian  Lie  groups  S  x  R  and  S  x  R  ,  respectively.  The 
signal  process  is  four-dimensional,  satisfying 


dx. 


lXldt-x2x3dt-x2dv 


dx2  , 


^  x2dt  +  XjX^dt  +  Xjdv 


-  J  x3(s)ds  +  v 


x.  x-  +  x. 

4  3  4 


x,(0)  -  1,  x?(0)  ^  x.(0)  --  x.(0)  -  0 


The  sensor  equations  are 


dzl  '  "  \  zldt  *  (2x4  *  x3  "  fx j(s)dx2(s)  +  J  x2(s)dx.(s))  z2dt  -  z-^dwj 


!  3 


:i 


;  .a 

i  £ 

‘I 


J 


A 
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dz 


2 


-  ~  z^dt  4-  (2x4  4-  x3  -  J  x1(s)dx2(s)  +  j  x2(s)dxj(s)) 
•  Zjdt  4-  ZjdWj 


where  w^  and  w2  are  standard  Brownian  motions  independent  of  each 

other  and  of  v.  Our  problem  is  to  find  the  least-squares  estimate  x  under 
2  2 

the  constraint  +  x.,  -  1,  Rearrangements  of  the  first  two  signal 
equations  yield 


Comparing  this  equation  with  (54).  we  see  that  its  solution  describes  a 
rotational  process  with  a  single  degree  of  freedom.  Let  y(t)  denote  the 
total  rotation  completed  at  t.Then  x.(t)  -  cos  y(t),  jc.,<t)  =  sin  y(t), 

A  L 

dy  =  x^dt  4-  dv  -  x^dx2  -  x2dx^ »  an^  tne  -irs*  tVi'°  sensor  equations  become, 
after  some  rearrangements, 


We  note  that  the  system  is  not  observable  with  just  the  S* 
observation  pair  |zj,z2|  or  with  just  the  R*  observation  z^.  but  that 
the  system  is  observable  whan  both  observation  processes  are  present. 


>KV'V»~ '^--'-"“-T','-.  -  )vfl •>*’'- 
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Following  the  approach  developed  in  this  section,  we  first  obtain  the 
following  optimal  filtering  equations: 


dy  " 

A 

y 

2ldz2  -  z2dzl 

A 

y 

dS3 

=  A 

A 

x3 

+  KC1  \ 

dz3 

-  C 

A 

x3 

dt  / 

d£4 

1 - 

*> 

i 

v 

- 

'N 

Lx4_ 

) 

[y(0),  x3(0),  x4(0)]  -  0 

where 

K  =  A  K  +  K  A'  -  K  C>  C  K  +  BB> 
K(0)  =  0 


1 


0 

1 

o' 

1 

A  = 

-1 

0 

0 

II 

« 

r\ 

0 

0 

1 

1 

0 

c  = 


1  1  2 
1  0  0 


With  help  of  previous  results,  we  see  that 


<£(1  -  cos  (y  -  y))  <  <?(1  -  cos  (y  -  5)) 


for  all  z  -measurable  ?  .  Hence 


1  2  ^  2 

2  <?l(cos  y  -  cos  y)  +  (sin  y  -  sin  y)  j 


=  1  -  <f[cos  (y  -  y)]  <1-  <?[cos  (y  -  ?)] 

=  4  <£[(cos  y  -  cos  ?)^  +  (sin  y  -  sin 


for  all  z  -measurable  ?.  This  shows  that  the  least-  squares  estimates  x^ 


a2  £ 

and  x^  under  the  constraint  x^  +  x2  *  *  are  give»  by 


Xj  =  cos  y 

A  .  A 

-  sin  y 

The  block  diagram  of  the  optimal  filter  is  given  in  Figure  6. 
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4.  Discrete  Time  Estimation 

We  new  wish  to  examine  the  problem  of  estimating  a  random 
process  on  S*  ,  given  a  series  of  discrete  measurements.  A  natural 
model  for  the  measurement  process  is  a  discrete  approximation  to 
the  continuous  measurement  process  discussed  in  the  preceding  section. 
We  approximate  the  continuous  measurement  process 

dz(t)  =  m(x(t)  ,  t)  dt  +  Jq(t)  dw(t) 

Z(t)  =  (  J(z)  )  (t) 

by  the  discrete  equations 

***  =  \-\-l  =  mk  <Xk)M  +  K^ 

Yk  =  exp(ykR) 

where  At  is  the  inter-measurement  time,  x  =x(kAt),  q  =  q(kAt)  , 

K  K 

m  (  •  )  =  m(  •  ,  kAt)  ,  and  Aw  =  w(kAt)  -  w(  (k-l)At)  . 

K  K 

We  can  rewrite  the  equation  as 

Yk  =  Yk  l  exp{  AyR  R  )  ,  (86) 

and  we  see  that,  given  the  measurements  Y  j , .  .  .  ,  Y  j  .  the  new 
information  contained  in  Yk  is  equivalent  to  the  new  information  in 
Y  *  Y  .  This  information  is  easily  seen  to  be  equivalent  to  the 

K—  i  K 

knowledge  of 

Ay  =  Ay  mod  2n  ,  (87) 

7k  k 

where  we  adopt  the  convention  Ay  e  [-TT,n)  . 

K 


It  is  here  that  we  see  a  marked  difference  between  the  discrete 
and  continuous  problems.  In  the  continuous  time  problem,  the 
continuity  of  the  stochastic  processes  results  in  our  knowing  dy(t)  , 
not  just  dy(t)  mod  .  However,  in  the  discrete  problem,  the 
ambiguity  associated  with  our  lack  of  knowledge  of  the  number  of 
rotations  that  occur  in  the  At  between  measurements,  is  reflected 
in  the  fact  that  our  information  is  just  Ay^  mod  2jt  . 

With  this  discretization  as  motivation,  in  subsection  4.  1  we 
will  formulate  a  class  of  single  stage  estimation  problems  on  S*  , 
and  will  derive  conditional  density  equations  that  lend  themselves  to 
a  relatively  simple  physical  interpretation  when  considered  alongside 
the  preceding  comments.  In  addition,  extensions  to  the  multistage 
problem  are  discussed. 

The  results  of  this  subsection  provide  a  striking  example  of  a 
class  of  systems  for  which  the  continuous  time  problem  is  decidedly 
less  complex  than  the  discrete  time  problem.  Thus  practical 
suboptimal  schemes  are  necessary  in  the  discrete  time  case.  To 
this  end,  an  appendix  has  been  included,  in  which  the  relationship 
between  the  discrete  and  continuous  problems  is  discussed.  Motivated 
by  this  discussion,  several  suboptimal  schemes  for  the  discrete 
problem  are  discussed  at  the  end  of  subsection  4.  1  . 

In  subsection  4.  2,  we  will  use  Fourier  series  analysis  to  study 
A  more  general  discrete  time  estimation  problem  on  the  circle.  The 
form  of  the  conditional  density  equations  will  suggest  a  simple  method 
for  designing  suboptimal  filters  for  any  estimation  problem  on  S*  . 


mmm 
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4.  1  Conditional  Distributions  on  S*  and  Optimal  Estimation 

Suppose  we  are  given  a  random  variable  x,  taking  values  in  R  *  , 
with  a  priori  density  p  (a)  .  We  can  "project"  this  variable  onto  the 
circle  by  the  equation 

0  =  xmod  Zn  . 

The  a  priori  density  (with  respect  to  the  standard  (Haar)  measure  on 
S*  )  for  0  is  given  by  the  associated  projection  map 

+  oo 

p9(a)  =  ^2  px(a  +  2mT)  ;  a  e  f-rr,n) 

n=-oo 

We  suppose  that  a  measurement  of  the  form 
'y  =  (m(x)  +  v)  mod2rT  ;  *y  e  [-tt.tt) 


is  taken,  where  v  is  a  random  variable  on  R*,  independent  of  x 
(and  thus  0),  with  density  pv(v)  ,  and  m:R*  —-R*  is  a  Borel  measur¬ 
able  function.  We  also  define  the  auxiliary,  unobtainable  "measure¬ 
ment" 


y  =  m(x)  +  v 

which  has  a  density  function  given  by 

+50 

py(P)  =  J  pv(8  -  m(u)  )  px(u)  du  . 


A 

& 


I 

3 

I 


I 


I 

*v 
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We  wish  to  compute  the  conditional  density  p  (a  I  8)  ,  or, 
equivalently,  the  density  Px|~(<x|  8)  .  Also,  we  wish  to  know  if  this 
density  has  a  particularly  nice  form  if  m  is  linear  and  x  and  v  are 
normally  distributed. 

We  will  derive  somewhat  more  general  results,  and  will  apply 
them  to  this  problem.  The  arguments  in  this  section  are  measure- 
theoretic  in  nature,  and  are  summarized  in  the  statements  of 
Theorem  7,  Theorem  8,  and  their  corollaries.  The  solution  to 
the  specific  problem  stated  above  is  given  in  the  statements  of  the 
two  corollaries  to  Theorem  8. 

We  consider  the  probability  space  (  Py)  where  is 

the  o-algebra  of  Borel  measurable  subsets  of  R*  ,  and  P  is  anv 

1  y 
probability  measure  on,V  .  We  define  two  random  variables  on 

this  space. 

y(u)  =  w 

y(u)  =  wmod2TT  (”y  e  [-rr.rr)  ) 

Then  ,  the  7-field  generated  by  y  ,  is  -V,  and  (defined  analo¬ 
gously)  consists  of  the  following  sets 

<v< 

A  c  .  A  [— Tf ,  tt  )  =  A  s  ,.(•/  and 

y 

+09 

A  =  U  (A  +  2nn ) 

n=-oo 

i.  e.  A  is  "periodic"  in  form  and  thus  is  determined  by  A  . 
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We  define  a  sequence  of  measures  on  the  measurable  space 
(f-n.rr)  ,SP),  where  SP  is  the  a-field  of  Borel  subsets  of  [-tt.tt)  : 


Pn  (S)  =  P  (S  +  2nrr) 

y  y 


PJS>  -  E  Py(S) 


S  e  <f 


clearly  P^(S)  =  0  ^  P^(S)  =  0  ,  which  means  P^1  is  absolutely 

continuous  with  respect  to  P^(P”<<  P^)  ,  and  thus,  by  the  Radon- 

Nikodym  Theorem,  [24],  [25],  for  each  n  there  exists  an  ^-measurable 

function  dPn/dP  ,  such  that 

v  y 


r  dP 

"(S)  =  \  — ^-(u)  dP  (w) 

Y  J  rs*  y 

S  dP 


VSey 


We  wish  to  compute  the  conditional  probability  measure  P^j  ~ 
for  y  given  'y  .  The  following  theorem  shows  that  this  conditional 
measure  can  be  expressed  in  a  form  that  reflects  our  uncertainty  as 
to  the  number  of  integral  multiples  of  2tt  that  separate  the  values 
of  y  and  y  , 

Theorem  7;  The  conditional  probability  measure 


Py|~(c|3)  =  P  (ysCjyr-.g)  ;  @  e  [-tt.tt),  Ce.rf 
can  be  expressed  in  the  form 


Py|  y  !CI e) 


4-oo 

=  V  v 


r  (3  +  2 tot)  — (  3) 

C  dP 

y 


where  is  the  characteristic  function  for  C  . 


Proof;  From  the  definition  of  conditional  expectation,  [26],  [27],  [28], 

we  have  that  Pyi~(Cj8)  is  the  unique  (up  to  with-probability-one 
* 

equivalence)  -measurable  function,  such  that  for  any  A  *• 

y  y 


J  VCMdPy(W)  =  J  py|  y  <cl'y(u>)  )  dPy(w) 


The  left-hand  side  of  (89)  equals 


E  j  vc(W)dP  (W)=  ^  f  *c(s+2nTT>  dpvn 

n='“  A+2nn  n=’“  *  Y 


where  A  -  A'"1  [-tt , tt)  .  This  last  equality  follows  from  the  definition 
of  and  the  obvious  relationships  among  ^ -  and  ^-measurability. 

fv/ 

Now  Pn  <<  p  ,  so 
y  y 


+oo  r 

E  )  yC  (?  +2nTT)  dPvn(5)  = 

n=-<»  j  y 


+co  f  dP 

E  \yc  (5+2nrr)  (?)  dP  (?) 

n=-oo  J  dP  y 

A  y 


*  \[t 

.  ln=-oD 


V  (?  +2nrr)  (?)  dP„  (?)  , 

dP  J  y 
y 


where  we  have  used  the  Radon-Nikodym  Theorem,  the  fact  that 
n  ^  ^ 

dPy /^Py  >  0  a.  e.  (Py)  t  and  the  monotone  convergence 

theorem,  [24]  . 


Similarly,  the  right-hand  side  of  (89)  is  equal  to 


TOO 

E  I  Pyi-=(c|yM)dPi«) 

n=_e°  A+2nn 


-  E  \  Py|y  (C|y(?))iPyn(§) 


where  we  have  used  the  periodicity  of  "yd)  .  Again  using  the  Radon- 
Nikodym  and  monotone  convergence  theorems,  this  last  expression  is 


equal  to 


nt  | 


pvi  v  ^Cl  y  ^)-^L  (?)dpv(5) 


r  +®  <JPn  1 

=  {  Py|y<C^ ,?)l  JS.  ^  «> 


*Py  (?) 


Clearly  P  is  a  probability  measure  on  [-tt.tt)  ,  since 


es-»  X  03 

P  ([-TT,tT>  )  =  2  Py  ) 


=  2  Py  ([-tt.tt)  +  2nn)  -  P(  (-“>  ,  ®)  )  =  1 


Also,  on  any  ^-measurable  set  S  , 


►8  if ? 


t“  e  dP 


Py(S)  =  Is  Py  {S)  =  2-r  J 

"  n=-^>  "  n=-®  o  dP 


(?)  dPv(^) 


1  D 

m  ——r 


±2  dP 


i  .  n=-®  dP 
S  y 


(5)  dP  (?)  , 


and,  since  P  is  a  finite  measure,  this  implies  that 


+oo  dP 

2  -=+-<?)  =  i 

n=-oo  dP 


a.  e.  (P  ) 

y 


Thus,  the  right-hand  side  of  (89)  is 


Py|~  <c|  y  (?)  )  dPy(?> 


Comparing  (90)  and  (91)  we  see  that  the  conditional  measure  is  given  by 


+»  dP 

Py|~  «=l  V<S)  >  *  £  »c  (5+2mr)  -g*-  (5)  . 

7 1  •  n=-<»  dP 


But  this  is  defined  for  5  s  [-tt.tt)  ,  and  in  this  case  ”y  (?)  =  5.  Thus 


~  +co  ‘ —  dP  ~ 

pv1  V  <C1S)  ’  £  vc(S+2mI)  -gi-(P) 


We  note  that  for  1  xod  p  ,  P  (C|  3)  is  a  sun 

I  /ew 

measures  concenti  ated  at  the  points  3  +  2nrr  ,  where 


P  (y  =  3  +  2nnfy  =  3  )  =  ~  -  (  P) 

dP 

y 


sum  of  Dirac 


pisa* 


K  * 


If 


1 


i 
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Thus,  in  terms  of  fi- functions,  we  can  write  the  conditional  "density" 


~  4*  co  dP  ^  ** 

py|7(?|P)  =  ^  ~7a^~~  (p)  5(§  -  P  -  2nn ) 


(92) 


n=-co  dP 


Corollary:  Suppose  P  is  absolutely  continuous  with  respect  to 


Lebesque  measure  X  : 


Py(A)  =  j  Py(D  dX(|) 
A 


Then  the  conditional  "density"  is  given  by 


~  p(P+2nrr) 

P„|7«|P)  =  2  -h - = -  6  (§  -  P  -  Znn) 


/  >  4*  CD 

*=~m  £  Py(P+2kn) 


(93) 


n=-® 


Py(8) 


6(P  -  mod2n)  ) 


(94) 


p(?  +2kn) 


,n  . 


Proof;  It  is  easy  to  see  that  P  is  absolutely  continuous  with  respect 


to  Lebesque  measure  (  also  called  X  )  on  [-tt.tt)  ,  and 


dP' 


3  -xy-  ■  (r\)  =  p  (r>+2nn) 


is  a  version  of  the  Radon-Nikodym  derivative.  Clearly  p  >  0 

a.  e.  (X)  ,  and  thus,  by  monotone  convergence  and  the  finiteness  of  Py 


1 


Jj 

1 


s 

J 


1 
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+  C0 

Y.  p(r|  +  2kn  ) 

k--<= 


is  finite  a.  e.  (X)  and,  in  fact,  is  the  Radon-Nikodym  derivative 


dP  /d\  .  It  is  then  clear  that 

y 


dP 


dP 


(Tl) 


=  dPyn/d^^  r  p(p  +  2nn) 


dP  / dx  (n) 

y 


+® 

/,  p(ri  +  2krr) 

k"® 


Finally,  consider  the  set  where  p^|~  (?|3)  is  undefined  -•-  i.  e.  where 

•f  CD  />«. 

Y.  p(P  +  2kn)  =  0 


But  this  set  is  a  set  of  P^-measure  zero.  Equation  (94)  follows 


immediately  from  (93)  and  the  properties  of  the  6-function. 


We  make  the  comment  that  P^  is  the  probability  measure  for 


the  random  variable  ”y  ,  and  thus,  a  naive  application  of  Bayes' 
rule  yields 


P7|v<S|5)fv(?) 
p  r  (?  |  P>  =  y  'y  ~  -  ±  - 

Y,y  Py 


6  (g  -  (^  mod2n)  )  p  ($) 
_ l— 


-too  ~ 

£  P  (3  +  2hTT) 


K=-® 


J 
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We  now  consider  a  more  general  version  of  the  problem  stated 

at  the  beginning  of  this  subsection.  We  assume  that  we  have  a  probability 

2  2  2  2 
space  (R  ,  a/  ,  P  )  ,  where  is  the  Borel  field  of  R  .  We 

define  three  random  variables 


x(Uj.w2)  = 

V <*Yu2>  u2 

y(W|,w2)  =  u2mo^^TT 


and  the  marginal  distributions 


Px(A)  =  PXy(AXRl)  Asa/1 

Py(  B)  =  ?Xy(  R 1  x  B)  Be  a/1 
(.V1  =  Borel  measurable  subsets  of  R  1  ) 

2 

We  let  a/y  =  the  minimum  sub  a -algebra  of  a/  ,  with  respect  to  which 

y  is  measurable,  and  we  define  a/x  and  a/~  analogously. 

We  wish  to  compute  the  conditional  measure  P  .  As  before, 

x|  y 

we  obtain  a  form  for  this  measure  that  reflects  our  uncertainty  as  to 

the  number  of  multiples  of  2tt  that  are  "chopped  off"  of  y  in  the  process 

of  observing  y  .  To  derive  the  desired  result,  we  will  need  to  consider 

two  o  ser  conditional  measures,  P  .  ~  and  P  ,  .  Since  a/~  C 

x|y,  y  x|y  y  y 

(i. '  ~  a  deterministic  function  of  y  ),  we  have 


;  .(A|p,e)  =  Pxjy(A|p) 


As  before,  we  define  the  following  measures  on  ([-tt.tt)  >&*  ) 


P  (S)  =  P  ;S  +  2nn) 

y  .  y 


^  -TCC 

P„(S)  =  2  ?n(S)  Sc<^ 


Theorem  8:  The  conditional  distribution  Pxj7  (C  J  3  )  is  given  by 

/v.  4-00  dPn  /v  »v 

Pxl  v  (CI  B)  =  (B)  Px|  V(C!  B  +  2raI  > 

I  y  n— -<»  dP  1 


Proof:  Ic  is  easy  to  see  from  the  previous  results  that  the  conditional 
probability 


p  .~(y=«|8)  =  Px  y.~(xsRl.  y  <G\ 


fS  P) 


exists  and  is  given  by 


py|7  (y  =  *!*>  =  dpy 


(G  =  8  +  2krr 


otherwise 


Using  the  properties  of  iterated  expectations,  [28],  and 


equation  (95) 


PKp(c|S)  =  <?ip„|y<C|«>[y  =B| 


Px|y(Cly(“r“2)  =  *>  dPx.y|y(“r“2l 


V  =  p  ) 
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Since  P^j  ^ ( C |  yfu^.u^)  =  CS  )  is  independent  of  u^  ,  we  perform  the 
integration  with  respect  to  Uj  first 

Px|  y  (C| >  =  j  Px|  y  <CI  y(“2'  *«  '  dPy|  y  l“2l  M 

R1 


dP n  ~ 

-^(P) 

dP 

y 


g  +  2ntT) 


•f  00 

E 

n=-co 


P  (y  =  g  +  2nTT|  7  =  P)  Pxjy  (CJ  y  =  g  +  2ntT). 


The  following  two  corollaries  solve  the  problem  posed  at  the 
start  of  this  subsection. 

Corollary  1 ;  Suppose  x  and  v  are  independent  real  valued  random 
variables,  and  define 


y  =  m(x)  +  v 
"y  =  y  mod  2tt 

where  m:  R*-*-  R*  is  measurable.  Also,  suppose  Px(a)  and  Pv(v) 

are  the  probability  densities  for  x  and  v  respectively.  Then  a  version 

of  the  probability  density  p  .  —  (ol|  g)  is  given  by 

x  y 


=  E 

l  7  n=  -< 


+®  py(g  +  ZnTi) 


®  p^  (P) 


Px|  y((1l  P  +  2nTr) 


(97) 


±3  Pv|  (P  +2Ha)Px<a> 


-  V  vl  x  T 
1*  _  rx 


n=-»  p^(P) 


(98) 


1 


■  i 


i 


* 

•a 


$ 


‘4 

3 

1 

I 


5 


%  . . 


1  -vZ/Zyl 

p  (v)  =  -  •  -  -  ■:  e  =  N(v ;  0,v?) 

v  <foy7 


Proof:  We  will  use  the  form  of  p  (a|{3)  given  in  (97).  The  additive 

x|  y 

properties  of  independent  normally  distributed  random  variables,  [29], 
yields 


P  (e)  =  N(p  ;  at],  a  +  v2) 


and  therefore  the  equation  for  cn(0)  is  correct.  Then 


— w#  T®  «v«  r^f 

Pxj~(a|P)  =  cn{P)PxJyHP+2„n) 


(102) 


But  px|  y  is  the  solution  of  a  linear  filtering  problem,  and  therefore 
is  a  normal  distribution.  In  fact 


pxj  yte|  3  +  ZnTT)  =  N(<x ;  nn.  v3  ) 


where 


V'z 

a^l  +  ^2 


r\  y2  +  +  2nn) 

~2 — 7 - 

a  V,  +V2 


That  is,  the  n  term  in  the  series  in  (102)  is  evaluated  by  an 
optimal  linear  estimator  which  takes  as  its  measurement  g  +  2nn  . 
We  also  note  that  if  the  initial  distribution  px(a)  is  an  infinite  sum  of 


,=  ASW:a»^^vv^.^^- 


normal  densities  with  means  r\  n  .  then  the  density  p^ | ~  (cij  (5)  is  a 

doubly  infinite  sum  of  normal  densities,  with  means  computed  by 

th 

optimal  linear  estimators,  the  (j,k)  of  which  takes  as  its  initial 
mean  p.  and  as  its  measurement  3  +  2kn  .  Again,  the  coefficients 
are  nonlinear  functions  of  the  measurement. 

Once  having  the  solution  p  ,~(o.|  8)  .  we  can  compute 

'V' 

pg|~(a|  3)  .  If  the  hypotheses  of  Corollary  2  are  satisfied, 

pgj~(aj  3)  Is  an  infinite  sum  of  folded  normal  densities. 

An  interpretation  of  the  form  of  the  conditional  density  is  readily 

available.  The  infinite  summation  is  a  result  of  the  "mod  2rr" 

ambiguity  in  the  measurement.  The  n^  term  in  the  sum  is  the  linear 

result  if  the  measurement  were  y  =  3  +  2nrr  ,  while,  as  derived  in 

Theorem  8  and  its  corollary,  the  coefficient  c  (8)  is  just 

P  (y  =  p  +  2nrr[y  =  8)  --  i.  e.  it  is  related  to  the  difference  between 

y  and  v  expressed  in  multiples  of  2n  . 

Thus,  the  terms  corresponding  to  the  more  likely  values  of  y  -- 

the  more  likely  number  of  multiples  of  2n  --  are  more  heavily  weighted. 

Thus,  one  could  consider  approximating  p  i~(a{8)  (and  thus 

I  ' 

p  1  —  (a |  3)  )  by  a  finite  sum  of  normal  distributions,  where  we  must 

u  I  y 

devise  a  procedure  for  deciding  which  terms  to  Keep.  Some  wort- 
involving  this  type  of  approximation  has  been  done  by  Buxbaum  and 
Haddad,  f 30 )  .  Such  a  procedure  is  certainly  necessary  if  x  is  a 
random  process  instead  of  a  random  variable  and  we  take  a  sequence 
of  measurements,  since,  by  a  simple  inductive  argument,  after  M 
measurements  our  conditional  density  consists  of  M  infinite  sums  of 
normal  densities.  Note  that  all  the  normal  densities  have  the  same 


variance. 


In  the  particular  case  in  which  m  is  linear,  we  see  that  the 
conditional  density  is  of  the  form 
00 

p(0)  =  £  CnF(0;VV)  (103) 

which  is  precisely  the  form  studied  in  Section  2  (see  equation  (21)  )  . 

Thus,  the  estimation  and  error  analysis  results  of  that  section  apply 

here.  These  results  will  also  apply  if  we  approximate  (103)  by  a 

finite  sum,  and,  since  the  truncation  procedures  of  Buxbaum  and 

Haddad  and  the  estimation  equations  of  Section  2  both  lead  to  simple 

algorithms,  this  approach  leads  to  easily  implemented  filter  equations. 

We  remark  that  the  appendix  to  this  report  contains  results 

relating  the  discrete  and  continuous  problems,  by  showing  that  as  the 

time  between  measurements.  At  ,  becomes  small,  the  terms  in  the 

conditional  density  corresponding  to  a  nonzero  number,  n  ,  of  rotations 

2 

between  measurements  go  to  zero  exponentially  in  1/At  and  in  n  . 
Thus  we  see  that  if  the  inter-measurement  1  Tie  is  small,  a  rather 
crude  truncation  procedure  --  one  that  keeps  only  a  few  terms, 
corresponding  to  one  or  two  rotations  --  will  provide  adequate 
accuracy. 

In  addition  to  the  method  of  truncating  the  infinite  series  in  some 
systematic  manner,  another  suboptimal  estimation  scheme  is  suggested 
by  fh.e  results  of  the  a  endix.  Since  for  small  At  the  difference  between 
the  continuous  and  discrete  time  solutions  is  small,  why  can't  we  use 
the  continuous  time  results  in  designing  a  suboptimal  discrete-time 
filter.’  That  is,  we  can  design  the  continuous  time  filter  and  use  as 
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an  input  the  discrete  time  measurements,  which  we  hold  constant 
over  the  interval  between  measurements. 

We  have  not  attempted  to  describe  in  detail  the  design  of  these 
various  suboptimal  estimati.ii  schemes,  nor  to  analyze  their  per¬ 
formance,  but  rather  we  have  only  meant  to  indicate  possible  alterna¬ 
tives.  Clearly  further  analysis  and  some  simulation  results  a',r 
necessary  before  we  can  decide  on  the  validity  of  these  different 
approximations.  The  conceptual  ideas  behind  these  two  basic  methods 
are  depicted  in  Figures  7  and  8  . 

Analogous  to  the  discussion  at  the  end  of  Section  3,  we  can 
extend  the  results  of  the  present  section  to  problems  on  arbitrary 
abelian  Lie  groups.  Let  x  be  a  random  variable  on  Rn+m  with 
probability  density  Px  (<Xj.  .  •  .  .^n  +m)  ,  and  consider  the  associated 
random  variable,  x  ,  on  Rn  *  (S*)m  defined  by  the  map  from 
R  into  R  x  (S  )  given  by 


X„+m)  —  '*1 . VXn+lm0d2n 


x  ,  mod  2tt  ) 
n+m 


Then  the  density  p^dj . an’°T  *  '  '  rr ■?  is  Siven  hy 


P?(ar- 


•  •  *  cl  » , . . .  ♦  cl  / 
n  I  m 


.1*  Px«*l 

k  =-»  — 


m 


an.a,  4  2K,n 


+  2k  TT  > 
m 


Figure  8:  Illustrating  the  Concept  of  Using  the  Continuous-Time  Filter 
to  Approximate  the  Discrete-Time  Filter 

If  d  is  a  multidimensional  normal  density,  then  p~  is  called  an 
rx 

(n,  m)  normal  density  --  n  referring  to  the  number  of  marginal  densities 
which  are  normal  and  m  to  the  number  of  folded  normal  marginal 
densities. 

It  is  easy  to  see  that  minor  changes  in  the  arguments  of  this 
section  lead  to  the  following  conclusion:  let  C  :  R  +rn-*~  R  be  a 
linear  map  and  w  a  {.-dimensional  normal  random  variable  independent 
of  x  ,  an  (n+m)- dimensional  normal  random  variable.  Consider  the 
random  variable  y  defined  by 

y  =  Cx  +  w 

and  define  the  associated  random  variable  y  by 

v.  =  v.  I  <  i  <  k, 

'v  =  y.  mod  2;  k.  <  i  <  k 

'ii  1 


•czip&zEgmz** 


I 
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Then  the  conditional  density  fxj~  can  be  written  as  a  (k-kj)-times 
countably  infinite  sum  of  normal  distributions,  the  (r. ..... r  ) 

1  K-k  j 

of  these  being  the  linear  result  if 


th 


V1  =  y*,«  +  2rin 


i  =  1 . k-k.  (104) 


and  the  coefficient  of  this  term  is  just  the  conditional  probability  for 
equation  (104)  to  hold,  given  the  random  variable  'y  . 

4.  2  The  Discrete  Measurement  Problem  Using  Fourier  Series 

As  was  seen  in  subsection  2.  2,  Fourier  series  can  be  a  useful 
tool.  In  this  subsection  we  will  use  it  to  aid  in  analyzing  a  rather 
general  discrete-time  estimation  problem  on  S*  .  Again,  we  will 
consider  the  single  measurement  case.  Extension  to  the  multistage 
process  with  measurement  noise  independent  from  stage  to  stage 
is  immediate. 

We  consider  the  problem  of  taking  a  measurement  of  a  random 
variable,  0,  on  the  circle  with  a  priori  density 


1 

P^(-)  =  ~=r  +  /  ,  a  sin  n£  +  b  cos  n£ 
r0  -  2tt  n  n 


We  assume  that  we  take  a  single  (possibly  nonlinear)  measurement, 
y  ,  of  0  ,  and  that  the  conditional  density  p^j  q(8|  c  )  exists.  Consider¬ 
ing  this  as  a  function  of  §  f.}~  fixed  8  ,  we  must  have 


(P|:  +2n>  =  Py|e(pl?)  • 


Thus,  we  can  write  p  .  (3|?)  in  Fourier  series  form  in  %  for  fixed  3 


CO 

Me,pi?)  ■  H*  +  £  cnO)  sin  n-  +  ^n(3)  cos  n? 


where  the  c  's  and  d  's  are  functions  of  3  -  An  application  of 
n  n 


Bayes'  rule  yields  the  Fourier  series  form  for  the  conditional  density 


PeU<5|« 


.  CD 

pe|y(5|P)  =  2^  +  an(3)sinnC  +  bn(p)  cos  ne  (105) 


where 


a  (3) 
n 


an(P) 

2ttc(3)  bn(P) 


2tt  c  (3) 


(106) 


l  nVH;  I  tr* 

C<P)  =  2S  P y<P»  =  TS—  *1  Ij  K  P„(M  +  b„d„«>]  <'07) 


ak(P)  =  akd0(B)  +^—  +  $  £  [andk.n(S)  +  Vk-n<W] 


+  I  ^iK*dn««+Vn««"l  -  t.ndn4k(9)+bn+Kc0(Wl| 


(108) 


Sfti'.S-vi,- . 
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dk0) 

*k<»  *  Vo<«  +  -V 


izJL 


ft  -  a„c  (3)1 


1  ( 

+  ?  E  [a  ,,  c  (3)  +b  d  (3)]  +  [aci,(3)+b1d  (3)]j  (109) 
2  (l  n+k  n'K  n  n+k  K  J  n  n+k  K  n+k  n  K  J| 


Note  that  the  equations  for  c  ,  a.  ,  and  (S  are  bilinear  in  the 

K  K 

Fourier  coefficients  of  Pg(§)  and  p^|g(3|E)  ,  and  one  should  note  the 
marked  similarity  in  the  structure  of  (108)  and  (109).  Thus,  the 
computation  of  p^  j  ^(E|  3)  involves  the  (in  general  nonlinear)  computa¬ 
tion  of  the  coefficients  (cn(3)}  and  (dn(3)}  and  the  evaluation  of 
the  bilinear  equations  (107),  (108),  and  (109)  . 

The  form  of  these  conditional  density  equations  suggests  a 
truncation  of  the  Fourier  series  for  p^  and  p^j  ^  ,  which  leads  to 
finite  sums  in  (107)  through  (109),  however  if  we  retain  the  first  N 
modes  of  pn  and  the  first  M  modes  of  p  i  ,  then  p„.  will  have 

y  0  6  y 

tH  * 

terms  up  to  the  (N+M)  mode.  Thus,  to  keep  the  necessary  memory 
in  a  multistage  process  from  growing  in  this  manner,  it  becomes 
necessary  to  devise  techniques  for  sequentially  truncating  the 
conditional  density  for  0  .  We  will  not  treat  this  problem  in  detail, 
but  will  make  some  general  comments.  In  general,  just  keeping  the 
first  N  modes  of  Pg|  y  is  not  an  acceptable  method,  since  we  require 
that  the  truncated  density  be  nonnegative  everywhere.  However,  if,  for 
instance  p,  .  is  continuous,  the  coefficients  fail  off  as  -iy  ,  and  thus, 

ely  N 

for  any  given  e>  0  ,  we  can  choose  N  sufficiently  large  so  that,  if 
we  keep  the  first  N  modes,  the  truncated  density  will  be  bounded  below 


<(???».• 


by  -£  .  Alternatively,  if  p  represents  the  truncated  version  of 
p  obtained  by  keeping  only  the  first  N  modes,  and  we  then  define 

0 1  y 


p  =  max(O.p) 


we  can  take  the  Fourier  coefficients  of  p  as  the  coefficients  of  our 
approximation  to  Pgj  ^  . 

If  we  were  to  use  the  straightforward  method  of  truncating  t'ne 
Fourier  series  for  pQj  y  .  equations  (10?)  through  (109)  can  be  written 


g(P)  =  A(p)h 


(110) 


where  h  is  the  vector  whose  elements  are  the  Fourier  coefficients  of 
pjc)  .  g(3)  contains  c(g)  and  the  a  (P)'s  and  <G .0) 1  s  and  AO)  is  a 
(2N+1)  *2N  matrix  (assuming  we  keep  N  modes  of  Pg  and  Pgj  y  ^ 
whose  elements  are  the  Fourier  coefficients  of  p^j  g  .  The  structure 
of  (107).  (108).  and  (109)  is  reflected  in  A  and  may  lead  to  efficient 
methods  for  evaluating  (110)  . 

Finally,  we  note  that  this  approach  is  extremely  general,  in  that 
the  only  restriction  on  the  form  of  the  measurement  is  that  the  conditional 
densitv  p  .  exist.  For  example,  in  addition  to  measurements  such  as 

'  *\rl  A 


y  =  (G  +  v)  mod  2tt 


which  are  considered  in  subsection  4.  1  ,  using  the  Fourier  series 
approach  we  can  also  consider  measurements  such  as 

v  =  sin  G  +  v  . 


^  It  has  recently  been  pointed  out  to  the  authors  that  Fourier  analysis 
results  for  this  particular  measurement  form  were  presented  in  ref.  47. 
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Applications  Including  FM-AM  Demodulation 
The  previous  sections  have  described  and  analyzed  a  class  of 
mathematical  models  for  which  relatively  simple  optimal  filter  have  been 
obtained.  The  problems  considered  include  many  inherently  nonlinear 
ones.  These  results  are  int<  •  esting  in  that  they  provide  a  new  way  of 
introducing  randomness  into  the  system  equations  in  such  a  manner  as  to 
lead  to  simple  synthesis  procedures  for  optimal  estimation. 

Among  the  potential  areas  of  application  are  FM  demodulation,  AM 
demodulation,  combined  AM-FM  demodulation,  optical  communication, 
frequency  stability,  and  gyroscopic  analysis. 

The  usual  mathematical  models  for  the  received  FM  signal  are 
(refs.  31,  32,  33) 

r(t)  -  A  cos  (Ugt  +  ^  x(s)ds)  +  Nj(t)  (111) 

or 

v 

r(t)  =  A  cos  (w_t  +  /  x(s)ds  +  N0(t))  *112) 

0  J0 

where  Nj  and  ^  are  noise  processes  (here  assumed  to  be  Brownian 
motion  processes),  and  x  is  the  signal. 

It  is  the  mathematical  model  (112)  that  we  will  consider  in  this 
section.  More  detailed  descriptions  of  and  other  analyses  using  this 
model  can  be  found  in  references  31-39  and  45. 

We  remark  that  there  are  techniques  for  determining 
sin  (Ugt  4  j[x(s)ds  +  N^t))  from  r(t).  Using  the  nc ration  of  the  preceding 
sections,  we  take  as  our  observation  the  '  x  2  orthogonal  matrix 


! 


i 


r.~r>T^--  ,. 


*»a»«wwil?^ 


Z(t)  = 


t.  t. 

cos  (u^t  +  f  x{s)ds  +  N2(t))  sintu^t  +  j  x(s)ds  +  N^(t)) 

t.  V 

-sinfu^t  +  J  x(s)ds  +  N2(t))  cosCw^t  +  /  x(s)ds  +  N^(t)) 


Then  we  can  apply  the  estimation  results  of  previous  sections  to  obtain  an 
optimal  estimate  for  the  signal  x(t)  {see  Lemma  3). 

If  the  signal  x(t)  is  a  linear  diffusion  process,  the  optimal 
demodulation  equations  take  a  particularly  simple  form.  Also,  if  we  have 
a  multi-channel  FM  system,  we  can  model  it  a  la  subsection  3.  5  and  use 
the  results  on  filtering  in  abelian  Lie  groups  to  design  an  optimal 
frequency  demodulator. 

The  theory  developed  in  this  report  also  has  possible  applications  in 

AM  modulation,  joint  AM-FM  modulation  (ref.  33,  p.  628),  and  optical 

communication.  The  Lie  group  of  interest  in  these  cases  is  C-{o[  -- 

the  set  of  nonzero  complex  numbers  with  complex  multiplication  as  the 

2 

group  operation.  Its  (real)  Lie  algebra  can  be  identified  as  R  ,  and  the 
2  .  , 

map  exp:R  — )  C  -  jOj  is  defined  by 


exp  (xpx2)  =  e 


Xj  +  L<2 


(113) 


We  note  that  C-jo}«=  R^xS*  via  the  identification 


(r,0) 


r  +  iO 


r  e  R,  0  e[-7T,7T) 


Thus  S  is  the  subgroup  of  C-  consisting  of  al  complex  numbers  of 

2 

length  one,  and  its  Lie  algebra  is  the  subalgebr'  of  R  obtained  by 
requiring  x^  -  0.  We  note  that  this  representation  of  S*  could  have  been 
used  in  the  preceding  sections,  instead  of  the  2x2  orthogonal  matrices. 


SSSKV.  a&zx. 


Also,  we  see  from  (113)  that  Xj  controls  the  amplitude  of  exp(Xj,x2), 
while  controls  the  phase. 

2 

Now  suppose  that  we  have  a  continuous  signal  process  on  R 


Xj(t) 


x,(t) 


We  define  our  measurement  process,  z(t),  as  follows: 


dy(t)  -  x(t)dt  +  dv(t) 


(114) 


z(t)  -  exp(y1(t),y2(t))  -  e 


yj(t)  +  iy2(t) 


(115) 


where  v'  =  (VpV^)  is  a  2-dimensional  Brownian  motion  process, 
independent  of  x. 

This  problem  clearly  fits  into  the  framework  discussed  in  Section  3, 
and  thus  can  be  solved  by  the  methods  described  previously  --  i.  e. 
knowledge  of  z(s),  s<t  is  equivalent  to  knowledge  of  y(s),  s  <  t.  In 
fact,  we  can  express  dy(t)  in  terms  of  z(t)  and  dz(t)  with  the  aid  of  the 
Ito  differential  rule: 


dz(t)  =  (dyj(t)  +  idy2(t))z(t) 


qu(t)  +  2iq,2(t)  -  q22(t) 

+  - 2 -  z(t)dt 


where 


E(dv(t)dv'  (t))  =  Q(t)dt 


q„(t) 


TWS'WJ 


^  ♦,1|5^^tf5S^^^^>*»r%>,v*?,r<*  a  <>-  >  * 


Thus 


dy,  (t) 


I  dz(t)  1 

1  z(t)  I  “ 


qll(t)  "  q22(t) 


dy2(t)  =  Im  {^}  -  q12(t)dt 


Also  note  that,  if  we  assume  y(0)  ~  0,  we  have 


*/(t)  =  J  x(s)ds  +  v(t) 


and  thus 


-j  r  t. 

v  (t)  t  iv,(t)  /  lx,(s)  +  ix2(s)]ds 
=  e  eU 


(116) 


We  then  see  that  our  signal  is  both  amplitude  and  frequency  modulated, 
and  the  noise  enters  muitiplicatively  and  is  a  complex  lognormal  process 
(ref.  40). 

Thus,  equation  (116)  yields  a  message  model  for  a  joint  AM- FM 
modulation  system,  for  which  there  is  a  simple  optimal  estimator.  The 
AM  case  is  obtained  by  setting  x2  =  v2  =  0.  We  note  that  our  AM 
modulation  is  not  the  usual  one  --  actually  x^(t)  is  more  like  an  amplitude 
rate  modulating  signal.  However,  if  we  let  x^(t)  -(d/dt)  Xj(t),  where 

-'W  _  -  . 

Xj(t)  is  the  actuc.i  signal  we  want  transmitted,  we  have  that  the  amplitude 

,  '■'w 

modulation  is  (assuming  Xj(0)  -  0  and  x  is  deterministic). 


Xj(s)  ds  x  j(t) 


. ,- ... 


Thus 


dvi<‘>  •  -  (Iff)  -  ^?-2ii'  * 

dy2(t)  =  Im  (^}  -  q12(t)dt 


Also  note  that,  if  we  assume  y(0)  -  0,  we  have 


y(t)  =  f  x(s)ds  +  v{t) 

•'o 


and  thus 


z(t)  =  e 


Vl(t)+iv2(t)  Jq 


j  (xj(s)  +  ijf.2(s)]dsl 


(116) 


We  then  set  that  our  signal  is  both  amplitude  and  frequency  modulated, 
and  the  noise  enters  multiplicatively  and  is  a  complex  lognormal  process 
(ref.  40). 

Thus,  equation  (116)  yields  a  message  model  for  a  joint  AM-FM 

modulation  system,  for  which  there  is  a  simple  optimal  estimator.  The 

AM  case  is  obtained  by  setting  x2  z,  v2  =  0.  We  note  that  our  AM 

modulation  is  not  the  usual  one  --  actually  x^(t)  is  more  like  an  amplitude 

rate  modulating  signal.  However,  if  we  let  x^(t)  -(d/dt)  Xj(t),  where 
<"N“’ 

Xj(t)  is  the  actual  signal  we  want  transmitted,  we  have,  that  the  amplitude 
modulation  is  (assuming  Xj(Q)  -  0  and  x  is  deterministic). 


Xj(s)  ds  x  j (t) 


"■*2*>*>*<^^**s*x*^-*;^*^^ 


or,  if  Xj  is  deterministic  and  differentiable,  and  we  let  Xj(t) 


(d/dt)Xj(t) 

x  j(t) 


W. 

^Xj(s)ds 


=  XjW/x^O) 


/"'*w 

(assuming  >  0).  Also  note  that  in  the  AM  case  (X2  i  0)  we  can  include 
v^tt)  as  a  random  phase,  and  in  the  FM  case  (x^  £  0)  we  can  include 
Vj(t)  as  a  random  amplitude. 

In  optical  communication  theory,  variations  in  the  transmission 
medium  —  e.  g.  turbulence  in  the  atmosphere  --  cause  variations  in  the 
refractive  index  of  the  air.  This  disturbance  can  be  modeled  (ref.  40)  as 
a  lognormal  noise  process  which  multiplies  the  signal.  In  this  case,  this 
analysis  (equations  (113)  through  (116))  may  prove  to  be  helpful  in  the 
design  of  good  receivers.  In  particular,  these  results  may  be  useful  in  the 
case  of  spatially  uniform  noise,  and,  in  addition,  we  can  treat  the  problem 
with  real  and  imaginary  parts  of  the  noise  process  dependent  on  each  other 
(q12(t)  |  0,  see  ref.  40). 

The  problem  of  frequency  stability  (refs.  41,  42,  43)  is  another  area 
of  application  of  the  results  of  this  report.  This  problem  involves  devices, 
such  as  oscillators  and  extremely  accurate  clocks,  in  which  we  wish  to 
measure  deviations  of  the  operating  frequency  from  some  ideal  or  nominal 
frequency.  In  other  words,  we  have  a  signal  of  the  form 


,i(v + i 


x(s)ds) 


where  is  the  fixed,  ideal  frequency  and  x(s)  is  the  random,  time- 
varying  deviation  of  the  actual  frequency  from  the  ideal  frequency.  The 


r 
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problem  is  to  devise  a  measurement  and  estimation  system  to  determine 
the  deviation  x(t). 

There  are  various  types  of  measurement  processes  discussed  in  the 

literature  (refs.  41,  42,  43).  One  of  the  most  widely  mentioned  involves 

the  multiplication  of  the  signal  by  the  output  of  a  second  oscillator  and  the 

measurement  of  the  beat  frequency.  That  is,  ,f  the  signal  from  the 

second  oscillator  is 

i(Wjt  -  v(t)) 
e 

where  Uj  is  a  fixed  frequency,  close  to  and  v(t)  is  a  random  deviation 
from  Up  our  measurement  essentially  is 

i[(«n  -w,)t+  /  x(s)ds  +  v(t)] 

u  i  o0 

e 

If  we  assume  that  v  is  a  Brownian  motion  process  independent  of  x,  and 
if  we  subtract  off  the  known  term  (w^  -  Wj)t,  we  are  left  with  the 
observation  equations 

dz(t)  -  x(t)dt  +  dv(t) 

Z(t)  r  elz(t) 

which  is  precisely  the  form  considered  in  this  report.  Further  if  we  model 
x  as  a  Brownian  motion  process  or  a  linear  diffusion  process,  we  can  use 
the  optimal  filtering  equations  of  subsection  3.  3. 

A  final  area  of  application  is  in  the  estimation  of  the  angular  position 
of  a  body  spinning  about  a  given  axis.  If  we  consider  the  single-degree-of- 
freedom  integrating  gyroscope,  (ref.  44,  pp.  104-105),  we  note  that  the 


output  of  this  device  is  an  angle  --  essentially  the  shift  in  orientation  of 
the  gyro  from  some  reference  position.  The  orientation  of  the  gyro  is 
determined  by  the  integral  of  the  angular  velocity  acting  on  the  gyro  about 
some  fixed  axis.  Noise  in  the  system  is  modeled  as  gyro  drift  --an 
error  in  the  angular  velocity  detected  by  the  device.  Using  this  model  for 
the  dynamics,  the  estimation  results  of  this  report  can  be  used  to  design 
a  system  to  estimate  the  actual  angular  velocity. 
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6.  Conclusions 

In  this  report,  a  class  of  estimation  problems  on  the  unit  circle  is 
formulated  and  resolved.  Both  the  continuous  time  and  discrete  time 
estimation  problems  are  considered.  The  signal  and  observation  processes 
on  the  circle  are  constructed  by  taking  the  projection  modulo  2u  of  the 
corresponding  standard  1-dimensional  processes.  The  stochastic 
differential  equations  which  govern  their  evolution  are  bilinear  in  form. 

The  observational  noise  can  be  viewed  as  entering  multiplicatively. 

Error  criteria,  probability  distributions,  and  optimal  estimates  on 
the  circle  are  studied.  In  particular,  various  properti  ;s  of  the  folded 
normal  density  in  connection  with  estimation  are  discussed  in  detail. 

An  effective  synthesis  procedure  for  continuous  time  estimation  is 
provided.  The  measurement  data  is  first  processed  through  a  nonlinear 
transformation.  The  transformed  process  then  goes  through  an  ordinary 
estimator,  such  as  the  Kalman-Bucy  filter.  After  another  nonlinear 
processing  of  the  output  of  the  ordinary  estimator,  the  desired  estimate  is 
yielded.  Filtering,  smoothing  and  prediction  can  all  be  treated  in  this 
manner,  and  its  generalization  to  estimation  on  an  arbitrary  abelian  Lie 
group  finds  no  difficulty. 

In  addition,  the  discrete  time  problem  was  studied,  and  an  intrinsic 
difference  between  the  continuous  and  discrete  problems  was  discussed. 

This  difference  stems  from  the  loss  of  information  between  the  discrete 
measurements.  Unlike  the  vector  space  case,  this  loss  of  information 
causes  the  expression  for  the  conditional  probability  distribution  to  be 
rather  cumbersome.  Although  suboptimal  estimators  can  be  obtained  from 
the  results  of  Section  4  by  careful  examination  of  the  form  of  the  equations, 


s*aB,SB,B*^^ 


the  increasing  complexity  of  these  equations  with  each  additional 
measurement  has  prevented  the  authors  from  deriving  recursive  equations 
for  the  optimal  estimate. 

Applications  to  AM  and  FM  demodulation,  optical  communication, 
frequency  stability,  and  fixed  axis  rotation  problems  have  been  described. 
These  practical  problems  provide  physical  justification  for  the  proposed 
mathematical  formulation.  The  application  of  the  mathematical  results 
of  this  paper  is  seen  to  lead  to  neat  solution  and  easy  implementation  in 
these  practical  situations. 
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APPENDIX 


Some  Limiting  Arguments  Relating  the  Discrete 
and  Continuous  Problems 

We  have  seen  in  Section  4  that  the  ambiguity  concerning  the 
number  of  rotations  leads  to  equations  for  the  conditional  density  that 
involve  infinite  sums.  Intuitively,  if  we  observe  the  process  contin¬ 
uously,  this  ambiguity  should  disappear  --  assuming  the  random 
processes  involved  are  continuous.  From  the  rigorous  arguments 
of  Section  3,  we  have  seen  that  this  is  the  case  --  i.  e.  in  the  limit 
we  know  dy(t),  not  just  dy(t)  mod  2rr  .  We  can  also  see  this  by 
examining  the  discrete  approximation  to  the  continuous  problem. 

Our  discrete  equations  are 

Ayk  =  (Ayjmod  2tt 

=  [m(x^,kAt)  At  +  Jq(kAt)  Aw^]  mod  2rr 

where  x^  =  x(kAt)  ,  and  x(t)  is  a  continuous  process,  independent  of 
the  Brownian  motion  process  v^t)  .  We  also  assume  q(t)  is  continuous. 
m(x,  t)  is  measurable  in  x  for  all  t  and  continuous  in  t  for  all  x  . 

We  wish  to  examine  the  effect  of  one  additional  such  measurement 
at  time  t  ,  in  terms  of  the  size  of  At  .  Thus,  we  assume  we  have 
computed 

Px(t)<a>  =  Px(t)(a|Past  measurements) 


(117) 
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and  that  we  take  the  measurement 


Ayt  =  (Ayjmod  2n 

=  (m(x(t),t)  At  +  Jq(t)  A«tj  mod  2tt 

As  indicated  in  equation  (117)  we  will  suppress  all  conditioning  on  past 

/v 

measurements.  Thus,  we  wish  to  compute  Px^tjj  (aj  £)  in  terms  of 
p  t)(a)  And  the  new  information  Ayt  .  (Here  (3  :s  the  observed  value 


of  Ay.  )  . 

L 


Using  the  discrete  measurement  formulae,  we  have 

/x  -f  CO  ^  /v 

Px(t)|Ayt,a!P)  =  E.  cn(P>Px(t)|Ayl“|l3<2“n>  (118) 


where  we  have  the  explicit  formula 


N(B  +  2ntr  -  m(u,t)  At;  0,q(t)  At)  Px^t)('*)  dv 


cn(P) 


4®  /° 

£  ( 

r=-®  J 


N(P  4  2rn  -  m(u,  t)  At;  0,  q(t)  At)  px^(u)  du 


(119) 


T  co 

)1  ~  2 

eXp  "  2q(t)  At  ^  +  2nTT  ‘  At3  px(t)(u)  du 

-00 

+®  f  *  2 

X#  |  exp  ~  2q(~t)"Tt  C P  +ZrTT  -  na(u,  t)  At)  Px(t)(u>  dv 


Examining  this  expression,  we  see  that  the  numerator  contains 


a  term 


':m 
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exp  - 


,22 

2n  n 

q(t)  At 


which  is  o( At  /  V  ^  >  0  if  n  ^  0  .  Thus,  one  sees  that  for  small  At  , 
the  probability  that  Ay^  and  Ayt  differ  by  a  nonzero  multiple  of 
2tt  appears  to  go  to  zero  quite  fast  as  At  -  0  .  To  make  a  precise 
statement  concerning  this,  we  must  make  some  technical  assumptions: 

(1)  The  probability  density  for  x(t)  conditioned  on  the  past 
measurements,  p.{^(a)  *  exists. 

(2)  The  conditional  density  for  x(t)  .  if  we  were  to  measure 
Ay^  (not  Ay^)  ,  Px(tj|  Ay  (<*(  exists  and  is  bounded  uniformly 
tor  ail  a  and  (3 . 

(3)  We  have  the  following  bound: 


+  C3 


2  2 


2.-2 


j  e-a  m  (u.t)  +?m(u,t)  p^(u)  du<  K(a2}  ek(a  )?  (l2o) 

-00 

2  2  2 

where  K(a  ),  k(a~)  are  bounded  for  a  e  F0,y]  »  for  some  -y>0  . 
We  can  now  prove  the  following 

Theorem  9  :  Tf  the  assumptions  above  hold,  we  have  the  following 
relationship  among  the  c  1  s  : 


V*  c  1 


C  Jg) 


n^O  cQ(B) 


=  o( At  ) 


?t>  0 


(121) 


Px(t)|iyt(:ii81  •  Px(HI  ayt!alS) 


J. 


-  o(At  ')  ¥^>0 


(122) 


L 


t 
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We  will  need  the  following  technical  lemma: 

Lemma  5:  Let  y(t)  be  a  continuous,  real-valued  function  of  time,  and 
define 


Ay  (s)  =  (v(s)  -  v(t)  )  mod  2tt 


(s  >.  t) 


Let  q  be  a  positive  constant.  p{x)  a  measurable  real  valued  function 

on  R*  ,  h>  0  ,  an  element  of  L*(R)  such  that  \\  h \\  .  >  0  .  and 

f  '  L 

[s^j  a  sequence  of  real  numbers  decreasing  to  t.  Then  there 

exists  an  integer  ,  such  that 

+0B 

(exp  ‘  ^  l  P<x)(sn  -  t)  -  2Ayt(sn)  p(x)  }  h(x)  dx 


>  2  ll  h  !l  1  V  n  >  No 

L 


(123) 


Proof:  Let  Fn(x)  be  the  integrand  of  the  left-hand  side  of  (123) 

Then,  for  fixed  x 


lim  F  (x)  --  h(x) 
n 
n 


By  Fatou's  Lemma  ([24].  f25]  ) 


n  inf  I F  dx  >  I  lim  F  dx 
_  I  n  —  J  n 

n  J  n 


=  HM!  j 

L 


Choose  NQ  .  such  that  for  m  >  NQ 


j  F  dx  >  lim  inf  IF  dx  -  i  |Jhj|  j 

Jm  -  n  J  n  L  L 


Then 


/F mdx  i  1  |!h!i  L1 


V  m>  N0 


Proof  of  Theorem  9 :  The  proof  of  this  result  requires  some  straight¬ 
forward  but  rather  lengthy  computations.  Thus,  we  shall  only  sketch 
the  proof,  leaving  the  details  to  the  interested  reader. 

„  Consider  the  infinite  sum 


1 

exp  "  2q(t)  At 


{2g(2nTT  -  m(u.t)At) 


+  (2nn  -  m(u.t)  }  Px'tj(u)  du 


Using  (120)  ,  we  have 


d(p)  <  exp  - 

n£0 


nn(2nn  -  2g) 
q(t )  At 


We  now  note  that  w(t)  is  a  continuous  random  process,  and,  therefore, 
we  assume  that  we  arc  given  a  continuous  sample  path,  w  (t)  .  We 
then  choose  a  6  >  0  ,  >  0  ,  kg>  0  ,  such  that 

K(H)  *  K° 


MfiMattttuaaaaaaaaama - saBamaaaigfiaeateaii 


^,6-%5-S^ 


ko(*lV)-  k0 


I  £yt°l  1  nfz 


y 

for  all  At <  6  (here  Ayt  is  the  value  for  the  particular  sample  path 
wP(t)  selected)  .  Then,  for  At<  6  . 


d(Ay 


°’-Ko£o 


in2  -  In!)  tt2 


(124) 


and  for  At  <  min(6,  q(t)/2kg)  ,  the  right  hand  side  of  (124)  is  finite. 
Examining  equation  (119)  ,  we  can  write 


.~  0. 

1  V  Cn(Ayt) 
At*'  niO  c0(Ayt°) 


d(Ayfc°) 


e.  o  - 


2q(t) 


lm(u,  t)  At  -  2Ayt  m(u.t)j  Px^t)(u) 


Taking  a  sequence  {Atr}"_j  decreasing  to  zero  and  using  Lemma  5, 


we  see  that  there  is  an  Rq  such  that 


+CD 

j  eXp  "  2q(tj  ^m^u’  ^  At,  -  2Ayt°(r)  m(u,  t)  }  px^(u)  du  >  j 


*ncrv>?  ■-»  "jrv*-  —  ■*  tT^vgr-Tnc*?' 

, .  -i  4<--v  -  >  * 


iT-v1.  — 
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whe  re 


Ayt°(r)  =  (m(x(t),  t)  Aty  +  q(t)  A^0)  mod  2tt 


and 


0  0. 

Aw.  =  w  (t  +  At  ) 
r  t  '  r 


w°(t) 


m 


m 


Then,  for  r> 


1  y*  cn(2:yt0(r)  )  <  2d^yt°(r)  } 
At^  n^O  cQ(AytV)  ) 


(125) 


At 


Using  (124)  ,  it  can  easily  be  shown  that  for  any  £  >  0  ,  there 
exists  a  positive  integer  r(  £  )  such  that 


1  V  Cn(2:yt0(^,  »  , 

— z  La  — - zns -  1 

Atr  n/0  cQ(Ayt  (r)  ) 


V  r  >  r(  e  ) 


Thus 


lim 


1 


cn(Ayt0) 

^ru: 


At  -  0  A  X  *vi 

At  n?0  cQ(Avt  ) 


for  any  continuous  sample  function  w  . 


To  prove  (122)  ,  wc  use  the  assumptions  that  p 


x(  t)  A  v 


(a|  p) 


is 


bounded  for  all  3  and  a  .  Let  M  be  an  upper  bound.  Then  rewriting 
equation  (118)  for  the  particular  sample  path  chosen,  we  have 


iv*N' iyi'j&Zt?  i>J»:«r> i <N.J.f?^<i^V,s‘ft' : » ’■-'  "*  <■  >>'^-yi  '-. »‘e''fcf5^ 
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~  o 


0. 


px(t)|lyt  H  Xyt  >  =  P*(t)|Ayl(alSvt) 

•  Sc*/c° 


~  0 


+  iTvTTir  p*(t>livtlt|lyt  ) 


rfO 


r  0 


c  /c 


+  „?0  >"+  f  c  /c„  p*<«liVa|  2ryt0+2ra) 


r?0 


r'  0 


Thus 


~  0 


~  0, 


Px(t)| »,  H  *vt  >  -  Px(t)|ayt,alXyt  >  i 


2M 


&  c"/c° 


1  + 


nfO 


c  ^ cn 
n  0 


=  o(At  )  V  i  >  0 


We  note  that  Theorem  9  may  still  be  true  even  if  equation  (120) 
is  not  satisfied.  An  examination  of  the  proof  shows  that  all  we  require 
is  the  following  :  let 


it®  2  2 

e-a  m  <u.tH?m(u.t)  P)i(t)(u)du 


Then  we  must  have 
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_  i n{  >mT  -  2Ayt°) 

L-P-- 


'At  (2mT+Ay°)l  , 

_A1_  _ L_  =  olAH 

2q(t)  ’  q(3 


VI  > 


(126) 


for  given  c  Kj  ,  K2  ,  .  which  depend  only  on  a  and  are  bound¬ 

ed  as  a-  0  ,  will  satisfy  (126)  .  Thus,  for  example.  Theorem  9  holds 

if  m  is  linear  and  p  ...  is  normal. 

*x(t) 


